Any Custom Frontend with Gradio's Backend

Gradio.Server allows developers to build custom frontends using frameworks like React or Svelte while retaining Gradio's robust backend features like queuing and ZeroGPU support.

gr.HTML: building rich, interactive frontends entirely inside Gradio using custom HTML, CSS, and JavaScript. That unlocked a lot. But what if that's not enough? What if you want to build with your own frontend framework entirely like React, Svelte, or even plain HTML/JS, while still benefiting from Gradio's queuing system, API infrastructure, MCP support, and ZeroGPU on Spaces?

That's exactly the problem gradio.Server solves. And it changes what's possible with Gradio and Hugging Face Spaces.

Text Behind Image: an editor where you upload a photo, the background gets removed using an ML model, and then you place stylized text between the foreground subject and the background. The text appears to sit behind the person or object in the image.

This needs:

A drag-and-drop canvas with layered rendering (background → text → foreground)
A rich control panel with sliders for font size, weight, letter spacing, color, opacity, stroke, shadow, 3D extrusion, perspective transforms, and more
A backend ML endpoint that runs a background-removal model and returns a transparent PNG
Export to PNG on the client side

There's no way to express this UI in Gradio components. It's a full web application. But We still wanted the backend power of Gradio: queuing, concurrency management, ZeroGPU support, and hosting on HF Spaces without infrastructure headaches.

gradio.Server extends FastAPI. It gives you the full power of FastAPI (custom routes, middleware, file uploads, any response type) while adding Gradio's API engine on top: queuing, SSE streaming, concurrency control, and gradio_client compatibility.

Here's the entire backend for Text Behind Image:

import os
import torch
from PIL import Image
from torchvision import transforms
from transformers import AutoModelForImageSegmentation
from gradio import Server
from gradio.data_classes import FileData
from fastapi.responses import HTMLResponse
import spaces

torch.set_float32_matmul_precision("high")
birefnet = AutoModelForImageSegmentation.from_pretrained("ZhengPeng7/BiRefNet", trust_remote_code=True)
birefnet.to("cuda")
birefnet.float()

transform_image = transforms.Compose([
    transforms.Resize((1024, 1024)),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
])

app = Server()

@spaces.GPU
def segment(image: Image.Image) -> Image.Image:
    """Run BiRefNet segmentation to produce a transparency mask."""
    image_size = image.size
    input_images = transform_image(image).unsqueeze(0).to("cuda")
    with torch.no_grad():
        preds = birefnet(input_images)[-1].sigmoid().cpu()
        pred = preds[0].squeeze()
        mask = transforms.ToPILImage()(pred).resize(image_size)
        image.putalpha(mask)
        return image

@app.api(name="remove_background")
def remove_background(image_path: FileData) -> FileData:
    """Remove background from an image. Returns transparent PNG."""
    im = Image.open(image_path["path"]).convert("RGB")
    result = segment(im)
    out_path = image_path["path"].rsplit(".", 1)[0] + ".png"
    result.save(out_path)
    return FileData(path=out_path)

@app.get("/", response_class=HTMLResponse)
async def homepage():
    html_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), "index.html")
    with open(html_path, "r", encoding="utf-8") as f:
        return f.read()

app.launch(show_error=True)

That's it. ~50 lines of Python. The model loads at startup, @spaces.GPU handles ZeroGPU allocation, and gradio.Server manages queuing and concurrency.

If this were a regular FastAPI app, you'd define a @app.post() route for background removal. That works, until two users hit it at once. Without concurrency management, both requests fight for the GPU, and the app crashes or returns garbage.

@app.api() solves this. It wraps your function with Gradio's queuing engine: requests are serialized, concurrency is controlled, and on ZeroGPU Spaces, GPU allocation is handled automatically via @spaces.GPU. As a bonus, any @app.api() endpoint is also callable via gradio_client.

The frontend talks to the backend using the Gradio JS Client. By using the Gradio JS client rather than a raw fetch() call, the frontend goes through Gradio's queue. That means concurrency is managed, GPU requests don't collide, and you could even show queue position or progress to the user.

With gradio.Server, Gradio doubles as a backend framework: use its UI system when you want it, bring your own frontend when you don't.

Source: Hugging Face Blog