FastAPI - Deep Dive
ASGI internals, dependency injection mechanics, async pitfalls, deployment topology, and the real cost of Pydantic.
FastAPI is a thin layer of magic over solid building blocks. Understanding the layers is the difference between building a toy and shipping something that does 50 million requests a day.
The stack underneath
FastAPI is built on:
- Starlette: the ASGI framework that does routing, middleware, request/response objects, WebSockets, BackgroundTasks.
- Pydantic: data validation, serialization, settings.
- uvicorn (or hypercorn): the ASGI server that runs the event loop, hands off to your app.
- uvloop: a libuv-backed event loop that drops in for asyncio's default and is roughly 2x faster.
ASGI in one paragraph
ASGI is a calling convention. Your app is a callable that takes three arguments: scope (dict describing the connection), receive (async function to pull messages from the client), send (async function to push messages back). HTTP requests come as http.request messages, responses go as http.response.start then http.response.body. WebSockets, lifespan events, and HTTP/2 server push all ride on the same primitive. FastAPI hides this from you, but when you write middleware you touch it directly.
Dependency injection mechanics
Depends(callable) tells FastAPI to call that function and inject the return value. Dependencies can themselves take Depends, so you build trees. Common pattern:
def get_db() -> AsyncSession:
async with AsyncSessionLocal() as session:
yield session
def get_user(db: AsyncSession = Depends(get_db), token: str = Header(...)) -> User:
return db.execute(select(User).where(User.token == token)).scalar_one()
@app.get("/me")
def me(user: User = Depends(get_user)) -> UserOut:
return userDependencies are cached within a single request by default. If get_db appears in five places, it runs once. You can opt out with Depends(get_db, use_cache=False).
Dependencies with yield get a teardown phase. The code after yield runs after the response is sent, in a finally block. This is how DB sessions close cleanly even on exceptions.
Async pitfalls in detail
The event loop is a single OS thread. Every await is a yield point. Between yield points, your code runs to completion. Three sharp edges:
-
CPU-bound work blocks everything. JSON parsing of a 5 MB body, image processing, regex on a huge string, all stall the loop. Push to a process pool:
await loop.run_in_executor(executor, fn, *args). -
Sync I/O blocks everything.
requests.get,time.sleep,psycopg2.connect, file I/O without aiofiles. These look innocent and they are not. Find them withasyncio.set_debug(True)andslow_callback_duration. -
Forgotten
await.db.commit()returns a coroutine that you never await. The commit never happens. Pyright with strict mode catches these. Use it.
BackgroundTasks vs Celery
BackgroundTasks runs after the response is sent, in the same process, on the same event loop. Good for: send a confirmation email, log an event, invalidate a cache. Not good for: anything that needs retries, persistence across deploys, or distribution across machines.
Celery (or Arq, or Dramatiq) gives you a real queue with a separate worker pool. Use it for: PDF generation, third-party API calls that can fail, anything with a retry policy.
The trap: BackgroundTasks share the request's DB session if you pass it in. The session closes when the request ends. The background task tries to query, the session is dead. Pass parameters by value, not references to request-scoped objects.
Pydantic, the real bottleneck
Pydantic validates and serializes every request and response. On Pydantic v1, this can be 30 to 60 percent of your handler's CPU time. Pydantic v2 rewrote the core in Rust and is 5x to 50x faster depending on the model.
Three tactics:
- Use
model_config = ConfigDict(arbitrary_types_allowed=False, frozen=True)to skip runtime checks where possible. - Use
Annotated[int, Field(strict=True)]for the strict mode. - For response models, consider
response_model=Noneand return dicts when the shape is dynamic. You lose validation, you gain speed.
Deployment topology
In production at Binocs:
- 4 uvicorn workers per pod (1 per vCPU on a 4 vCPU pod).
- Pods behind an EKS service, fronted by an ALB.
- HPA scaling on CPU at 70 percent.
--lifespan onso startup/shutdown hooks run.--workersonly at the supervisor level (gunicorn or systemd), not via uvicorn's--workersflag, because uvicorn's worker management is less mature than gunicorn's.
Actual command:
gunicorn app.main:app -k uvicorn.workers.UvicornWorker -w 4 --bind 0.0.0.0:8000
Logging: structured JSON via structlog, one log line per request including method, path, status, latency, tenant_id, user_id. Shipped to CloudWatch then to Grafana Loki.
Testing
httpx.AsyncClient against the ASGI app, no network. Test database is a real Postgres in a container, transactional fixtures roll back per test. Speed: 200 tests in 4 seconds.
@pytest.fixture
async def client():
async with AsyncClient(app=app, base_url="http://test") as c:
yield c
async def test_create_invoice(client, db):
r = await client.post("/invoices", json={"amount": 100})
assert r.status_code == 201Edge cases worth knowing
- Streaming responses: return
StreamingResponse(generator). Useful for large CSV exports. The client gets bytes as your generator yields, no buffering. - WebSockets: first-class via Starlette. Accept, send_json, receive_json, close. The handler is a coroutine that lives for the lifetime of the connection.
- Server-sent events: just a
StreamingResponsewithtext/event-streamcontent type and a generator that yieldsdata: ...\n\nlines. - File uploads:
UploadFileis async-friendly. Large files spool to disk automatically.
Learn more
- DocsFastAPI DocumentationFastAPI
- Docs
- Docsuvicorn Documentationuvicorn
- DocsPydantic v2 Migration GuidePydantic
- TalkTalk Python: FastAPI internalsTalk Python