Webhook ingestion (Spur question)
Receive untrusted async events at scale: verify signature, ack fast, process async, dedupe, and survive replay.
This was a Spur interview question. The naive answer is "expose a POST endpoint." That fails in production for six reasons. Here is the answer that does not.
The 200-OK-fast rule
Return 200 within 1 second. Do nothing else. Push the event into a queue. A worker processes it. If you do business logic in the handler, you will hit the provider's timeout (Stripe is 30s, most are 5-10s), they will retry, and you will process the same event twice.
Six things every webhook receiver must do
- Verify signature. HMAC-SHA256 of the raw body with a shared secret. Reject mismatches with 401. Never trust the body.
- Check timestamp. Reject events older than 5 minutes (Stripe's window). Prevents replay attacks where an attacker reuses a stolen payload later.
- Persist raw event before ack. Write the entire payload to durable storage (queue or DB) before returning 200. If your queue is down, return 500 so the provider retries. Never lose an event.
- Dedupe by event ID. Every provider sends an
idfield. Insert into aprocessed_eventstable with the ID as primary key. Conflict means duplicate, skip silently. - Process async. Worker pulls from the queue, does the business logic, marks done. Decouples ingestion from processing.
- Make the handler idempotent. Even with dedupe, processors will see the same event twice during failovers. The business logic must be safe.
Signature verification
import hmac, hashlib
expected = hmac.new(secret, raw_body, hashlib.sha256).hexdigest()
if not hmac.compare_digest(expected, header_signature):
return 401Use compare_digest, not ==. Constant-time comparison prevents timing attacks.
Retries and replay
Providers retry on non-2xx for hours or days. Stripe retries for 3 days with exponential backoff. Your processor must handle the same event 100 times without error.
A determined attacker who captures one valid payload can replay it forever. The timestamp window stops this.
Ordering
Webhooks are NOT ordered. You will receive subscription.updated before subscription.created sometimes. Two strategies:
- Reorder by timestamp. Stash out-of-order events, apply in order.
- Idempotent by version. Each event has a version number, only apply if version > current.
Stripe ships both. Most teams use the version approach.
The Spur question
The interviewer asked how I would build webhook ingestion for 10k events/sec. The answer: edge ingestion in front of a managed queue (SQS, Kafka), with a stateless verifier. Each pod handles one verification + one queue push, both <10ms. Horizontal scaling is trivial. Processors are separate, autoscale on queue depth. The whole architecture is "thin synchronous, fat asynchronous."
Learn more
- DocsStripe: Receiving webhooksStripe docs
- DocsGitHub: Securing webhooks with HMACGitHub docs
- DocsSvix: The ultimate guide to webhooksSvix docs