In revision.
Crisp5 min readGo deeper →

Blocking, non-blocking, async I/O

Blocking parks the thread, non-blocking returns EAGAIN, async lets the kernel notify when ready. Async wins at scale.

The four models

  1. Blocking I/O: read() parks the thread until data arrives. One thread per connection. Simple. Doesn't scale past a few thousand connections.

  2. Non-blocking I/O: fd set to O_NONBLOCK. read() returns immediately with EAGAIN if no data. Caller polls. Wastes CPU if you actually poll, useless on its own.

  3. I/O multiplexing: select, poll, epoll, kqueue. One thread asks the kernel "tell me which of these N fds is ready," sleeps until at least one is, then handles it. Foundation of nginx, redis, node.js.

  4. Asynchronous I/O: you submit the operation; the kernel does it; you get notified on completion. POSIX AIO (mostly unused), Windows IOCP, and Linux io_uring fit here. Read returns the actual data, not just readiness.

The first three are about "is it ready to read." Only true async actually does the read in the background.

When each shines

ModelSweet spot
Blocking + threads<1000 connections, simple code, low concurrency services
Non-blocking + epoll10k-1M connections, network servers, proxies
io_uringHigh-throughput storage and network, latest kernels
IOCP (Windows)Same as io_uring, but on Windows
Three I/O models. Async actually does the work in the background.

The thread-per-connection trap

Apache prefork was one process per request. Apache worker was one thread per request. Both die at 10k concurrent connections because thread stacks (8MB each) and context switches eat the box.

Nginx, Node, Redis use one or a few threads, each running an epoll loop, handling thousands of connections cooperatively. Same hardware, 100x the connections.

Cost-per-connection in modern setups:

  • Thread-per-connection: ~10MB RAM, plus context switch every syscall.
  • epoll: ~10KB RAM (fd table entry, kernel buffer), no per-connection thread.
  • io_uring: similar to epoll, with batched submission.

The interview answer

"Blocking is one thread per connection, doesn't scale. Non-blocking with epoll lets one thread service thousands of fds: kernel tells you which are ready, you read them, repeat. io_uring goes further: you submit ops, the kernel does the work, you get the result. The C10K problem of 1999 was solved by epoll; the C10M problem of today gets solved by io_uring plus careful zero-copy. The rule is: never call a blocking syscall inside an event loop."

Learn more