FastAPI - Deep Dive

ASGI internals, dependency injection mechanics, async pitfalls, deployment topology, and the real cost of Pydantic.

FastAPI is a thin layer of magic over solid building blocks. Understanding the layers is the difference between building a toy and shipping something that does 50 million requests a day.

The stack underneath

FastAPI is built on:

Starlette: the ASGI framework that does routing, middleware, request/response objects, WebSockets, BackgroundTasks.
Pydantic: data validation, serialization, settings.
uvicorn (or hypercorn): the ASGI server that runs the event loop, hands off to your app.
uvloop: a libuv-backed event loop that drops in for asyncio's default and is roughly 2x faster.

ASGI is a calling convention. Your app is a callable that takes three arguments: scope (dict describing the connection), receive (async function to pull messages from the client), send (async function to push messages back). HTTP requests come as http.request messages, responses go as http.response.start then http.response.body. WebSockets, lifespan events, and HTTP/2 server push all ride on the same primitive. FastAPI hides this from you, but when you write middleware you touch it directly.

Dependency injection mechanics

Depends(callable) tells FastAPI to call that function and inject the return value. Dependencies can themselves take Depends, so you build trees. Common pattern:

def get_db() -> AsyncSession:
    async with AsyncSessionLocal() as session:
        yield session
 
def get_user(db: AsyncSession = Depends(get_db), token: str = Header(...)) -> User:
    return db.execute(select(User).where(User.token == token)).scalar_one()
 
@app.get("/me")
def me(user: User = Depends(get_user)) -> UserOut:
    return user

Dependencies are cached within a single request by default. If get_db appears in five places, it runs once. You can opt out with Depends(get_db, use_cache=False).

Dependencies with yield get a teardown phase. The code after yield runs after the response is sent, in a finally block. This is how DB sessions close cleanly even on exceptions.

Async pitfalls in detail

The event loop is a single OS thread. Every await is a yield point. Between yield points, your code runs to completion. Three sharp edges:

CPU-bound work blocks everything. JSON parsing of a 5 MB body, image processing, regex on a huge string, all stall the loop. Push to a process pool: await loop.run_in_executor(executor, fn, *args).
Sync I/O blocks everything. requests.get, time.sleep, psycopg2.connect, file I/O without aiofiles. These look innocent and they are not. Find them with asyncio.set_debug(True) and slow_callback_duration.
Forgotten await. db.commit() returns a coroutine that you never await. The commit never happens. Pyright with strict mode catches these. Use it.

BackgroundTasks vs Celery

BackgroundTasks runs after the response is sent, in the same process, on the same event loop. Good for: send a confirmation email, log an event, invalidate a cache. Not good for: anything that needs retries, persistence across deploys, or distribution across machines.

Celery (or Arq, or Dramatiq) gives you a real queue with a separate worker pool. Use it for: PDF generation, third-party API calls that can fail, anything with a retry policy.

The trap: BackgroundTasks share the request's DB session if you pass it in. The session closes when the request ends. The background task tries to query, the session is dead. Pass parameters by value, not references to request-scoped objects.

Pydantic, the real bottleneck

Pydantic validates and serializes every request and response. On Pydantic v1, this can be 30 to 60 percent of your handler's CPU time. Pydantic v2 rewrote the core in Rust and is 5x to 50x faster depending on the model.

Three tactics:

Use model_config = ConfigDict(arbitrary_types_allowed=False, frozen=True) to skip runtime checks where possible.
Use Annotated[int, Field(strict=True)] for the strict mode.
For response models, consider response_model=None and return dicts when the shape is dynamic. You lose validation, you gain speed.

Deployment topology

In production at Binocs:

4 uvicorn workers per pod (1 per vCPU on a 4 vCPU pod).
Pods behind an EKS service, fronted by an ALB.
HPA scaling on CPU at 70 percent.
--lifespan on so startup/shutdown hooks run.
--workers only at the supervisor level (gunicorn or systemd), not via uvicorn's --workers flag, because uvicorn's worker management is less mature than gunicorn's.

Actual command:

gunicorn app.main:app -k uvicorn.workers.UvicornWorker -w 4 --bind 0.0.0.0:8000

Logging: structured JSON via structlog, one log line per request including method, path, status, latency, tenant_id, user_id. Shipped to CloudWatch then to Grafana Loki.

Testing

httpx.AsyncClient against the ASGI app, no network. Test database is a real Postgres in a container, transactional fixtures roll back per test. Speed: 200 tests in 4 seconds.

@pytest.fixture
async def client():
    async with AsyncClient(app=app, base_url="http://test") as c:
        yield c
 
async def test_create_invoice(client, db):
    r = await client.post("/invoices", json={"amount": 100})
    assert r.status_code == 201

Edge cases worth knowing

Streaming responses: return StreamingResponse(generator). Useful for large CSV exports. The client gets bytes as your generator yields, no buffering.
WebSockets: first-class via Starlette. Accept, send_json, receive_json, close. The handler is a coroutine that lives for the lifetime of the connection.
Server-sent events: just a StreamingResponse with text/event-stream content type and a generator that yields data: ...\n\n lines.
File uploads: UploadFile is async-friendly. Large files spool to disk automatically.

Learn more

Docs
FastAPI DocumentationFastAPI
Docs
ASGI SpecificationASGI
Docs
uvicorn Documentationuvicorn
Docs
Pydantic v2 Migration GuidePydantic
Talk
Talk Python: FastAPI internalsTalk Python

Deep dive15 min read← Back to crisp