Shared memory and IPC

Shared memory is the fastest IPC because there's no copy; pipes and sockets are easy but go through the kernel.

Mechanism	Speed	Use
Shared memory	Fastest, no copy	Large data, same machine, perf critical
Memory-mapped file	Fast	Producer/consumer with persistence
Pipe (anon)	Fast	Parent-child, byte stream
Named pipe (FIFO)	Fast	Same machine, unrelated processes
Unix domain socket	Fast	Same machine, message or stream
TCP/UDP socket	Slower	Network or local, same API
Signal	Tiny payload	Notifications only
Message queue (POSIX, SysV)	Medium	Structured messages, kernel-managed
eventfd, signalfd	Tiny	Wakeup primitives for event loops
dbus, gRPC, ZeroMQ	Easy	Higher level, message routing

Shared memory wins on raw throughput. Sockets win on usability. Pick based on whether you need bytes/sec or simplicity.

Shared memory in 4 calls

int fd = shm_open("/myshm", O_CREAT|O_RDWR, 0660);
ftruncate(fd, SIZE);
void *p = mmap(NULL, SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
// any process that shm_open + mmap the same name sees the same memory

The kernel allocates physical pages backing the shared object. Both processes' page tables map those frames. Writes by one are immediately visible to the other (modulo memory ordering, you still need atomics or barriers for correctness).

Synchronization is YOUR problem

Shared memory has no locks built in. You need:

Atomics for simple counters (C11 atomic_int, std::atomic in C++).
Process-shared mutexes: pthread_mutex_init with PTHREAD_PROCESS_SHARED attribute, placed in shared memory.
Semaphores: POSIX sem_init with pshared=1, or named sem_open.
Lock-free ring buffers for high-throughput producer/consumer.

If two processes write to the same memory without sync, the result is undefined. Same rules as multithreaded code.

Same physical pages, different virtual addresses in each process.

Why sockets are slower

Sending 1KB over a unix socket: userspace -> copy into kernel buffer (1 copy) -> kernel queues message -> peer reads (1 more copy). Two copies plus two syscalls.

Shared memory: write to a memory location. Zero copies, zero syscalls. The other process reads from the same location.

For 1KB at low rates, sockets cost a few microseconds. For 1GB/sec throughput, the copies kill you. Shared memory is the right answer for high-bandwidth IPC.

The interview answer

"Shared memory via shm_open + mmap is the fastest IPC: same physical pages mapped into two address spaces, no copies, no syscalls per access. The catch is you must synchronize yourself with atomics, process-shared mutexes, or lock-free structures. Pipes and unix sockets are easier but incur copies and syscalls. Sockets win on usability and span across machines; shared memory wins on raw throughput on one box. And never store pointers in shared memory: each process has a different virtual address for the same physical page."

Learn more

Docs
OSTEP: IPC chapterOSTEP
Docs
man 7 shm_overviewman7.org

The menu

Mechanism

Speed

Use

Shared memory

Fastest, no copy

Large data, same machine, perf critical

Memory-mapped file

Fast

Producer/consumer with persistence

Pipe (anon)

Fast

Parent-child, byte stream

Named pipe (FIFO)

Fast

Same machine, unrelated processes

Unix domain socket

Fast

Same machine, message or stream

TCP/UDP socket

Slower

Network or local, same API

Signal

Tiny payload

Notifications only

Message queue (POSIX, SysV)

Medium

Structured messages, kernel-managed

eventfd, signalfd

Tiny

Wakeup primitives for event loops

dbus, gRPC, ZeroMQ

Easy

Higher level, message routing

Shared memory wins on raw throughput. Sockets win on usability. Pick based on whether you need bytes/sec or simplicity.

Shared memory in 4 calls

int fd = shm_open("/myshm", O_CREAT|O_RDWR, 0660);
ftruncate(fd, SIZE);
void *p = mmap(NULL, SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
// any process that shm_open + mmap the same name sees the same memory

Synchronization is YOUR problem

Shared memory has no locks built in. You need:

Atomics for simple counters (C11 atomic_int, std::atomic in C++).

Process-shared mutexes: pthread_mutex_init with PTHREAD_PROCESS_SHARED attribute, placed in shared memory.

Semaphores: POSIX sem_init with pshared=1, or named sem_open.

Lock-free ring buffers for high-throughput producer/consumer.

If two processes write to the same memory without sync, the result is undefined. Same rules as multithreaded code.

Same physical pages, different virtual addresses in each process.

Why sockets are slower

Sending 1KB over a unix socket: userspace -> copy into kernel buffer (1 copy) -> kernel queues message -> peer reads (1 more copy). Two copies plus two syscalls.

Shared memory: write to a memory location. Zero copies, zero syscalls. The other process reads from the same location.

For 1KB at low rates, sockets cost a few microseconds. For 1GB/sec throughput, the copies kill you. Shared memory is the right answer for high-bandwidth IPC.

The interview answer