Deep dive15 min read← Back to crisp

Binocs - Stripe international payments

The full architecture of the three Stripe microservices at Binocs - checkout, tax-and-geo pricing, webhooks. Idempotency, reconciliation, 3DS, dispute handling, and the patterns that kept failed transactions at zero.

The brief

When I joined Binocs in July 2025 the Stripe integration was a single file in the monolith. It worked for US-card subscriptions and nothing else. The PM had a German customer waiting. I had a month. The brief was "make payments work everywhere, never lose money, never double-charge".

I ended up owning three microservices for 11 months. End of year - zero failed transactions in prod. This is how.

Why three services, not one

The temptation was to leave it in the monolith and add code. I argued for three services for one reason - the failure modes are different and I wanted them isolated.

Checkout - synchronous, user-facing, Stripe API can be slow or down, the user is staring at a spinner.
Tax-and-geo pricing - read-heavy, deterministic, our own DB, Stripe Tax as a dependency.
Webhooks - asynchronous, must be idempotent, must never lose an event, Stripe retries for 3 days.

If a webhook handler crashes, checkout is unaffected. If Stripe API is slow, checkout degrades but the pricing service still answers from our DB. Separation of concerns by failure mode, not by entity.

Three services split by failure mode.

Checkout service

The checkout service does one thing - create a Stripe Checkout Session and redirect the user. The trick is in the inputs.

Inputs - user ID, plan ID, optional coupon, optional seat count. The service asks the pricing service "what does this user pay" and gets back a price ID, currency, and tax behavior. Then it calls Stripe with an idempotency key derived from user_id + plan_id + a daily salt. If the same user clicks subscribe twice in a minute, they get the same Checkout Session, not two.

The session URL gets returned to the frontend, the frontend does a 302. Stripe hosts the card form (so we are out of PCI scope), 3DS happens on the bank's page, and the user lands back on our success URL. The success URL does not write to our DB. It reads from the DB and shows a "subscription confirmed" page if the webhook has already landed, or a "we are processing your payment" page if not. The webhook is the writer.

This is the single most important pattern in Stripe. I have seen three companies get this wrong. The redirect is a UX hint, the webhook is the receipt.

Tax-and-geo pricing service

This service is the only thing that knows what a customer pays. The pricing model is a 3D matrix - plan x region x currency. Plus volume discounts, plus annual discounts, plus enterprise overrides held as exceptions in a separate table.

For tax I used Stripe Tax. The alternative was Avalara, which we evaluated, and the answer was "Stripe Tax is good enough for a startup and one less vendor to integrate". Stripe Tax handles the EU VAT rules, the UK reverse-charge for B2B, US sales tax by state, GST in India and Australia, and a hundred other edge cases I do not have to think about.

The service exposes one endpoint - GET /pricing?user_id=X&plan_id=Y. Response includes the Stripe price ID (so the checkout service does not have to look it up), the customer-displayed amount in their currency, the tax amount, and the tax behavior (inclusive or exclusive). This response is cached in Redis for 60 seconds with the user ID as the key, because pricing does not change minute-to-minute but a logged-in user hitting the pricing page hits this endpoint a lot.

Webhook service

This is the service that kept me employed. Stripe will retry a webhook for 3 days if you do not return 200. That is good - we cannot lose events. But it means you will see every event multiple times, and you must be idempotent or you will double-charge customers.

The idempotency strategy - every Stripe event has an id like evt_1ABCxyz.... We have a stripe_events table with id as PRIMARY KEY. The webhook handler does INSERT INTO stripe_events (id, type, payload) VALUES (...). If the insert fails with a unique-violation, we return 200 and stop - we have already seen this event. If the insert succeeds, we process it inside a transaction and commit.

Signature verification is non-negotiable. Stripe signs every webhook with HMAC-SHA256 using the webhook secret. We verify the signature before reading the body. A failed signature is a 400 and an alert, because it means either our secret is wrong or someone is sending us fake webhooks.

Reconciliation job

The webhook service catches every event Stripe sends. The reconciliation job catches the events Stripe does not send, or the bugs in our handlers.

Every hour, a job runs. It lists all subscriptions in Stripe that changed in the last 2 hours, lists all subscriptions in our DB that changed in the last 2 hours, diffs them on state (active, past_due, canceled), amount, and currency. Any diff goes to a Slack channel I read every morning.

In 11 months the job fired maybe 6 times. Each time was a real bug - one was a missed webhook because the webhook secret got rotated and not updated, two were Stripe outages where webhooks were delayed past our 2-hour window, the rest were my own handler bugs. The job is cheap to run and catches the things you cannot predict.

3DS - the hard bug

The hardest bug of the year was Strong Customer Authentication. A German customer was getting redirected to her bank for 3DS, authenticating successfully, getting redirected back to us - and our success page was 500-ing because the bank's redirect dropped a query parameter we expected.

The wrong fix was to read every possible param from the redirect. The right fix was to stop reading the redirect at all. The success page now reads from our DB by the session ID in the URL, and if the DB does not have the order yet, it shows "we are processing your payment, this page will update". The webhook lands within seconds (sometimes minutes in extreme cases), the order appears in the DB, the page refreshes.

This is the lesson - the post-payment redirect is unreliable across banks, countries, and 3DS variants. The webhook is the only source of truth.

Disputes and chargebacks

We had two disputes in 11 months. Both were friendly fraud (legitimate customer disputing a legitimate charge). The webhook charge.dispute.created fires, our handler creates an internal incident, the ops team gets the receipt and any usage evidence ready, we submit through the Stripe dashboard. Both were resolved in our favor.

The pattern I built was that every dispute triggered a Slack alert with the customer ID, the amount, the dispute reason, and a one-click link to the Stripe dashboard. Ops did not have to log in and search. Friction kills response time, and dispute response times affect win rates.

What I would do differently

Two things.

First, I would have set up Stripe's test clock from day one. We tested the subscription lifecycle by waiting real time and it was slow. Test clocks let you fast-forward a subscription through a year of billing cycles in seconds.

Second, I would have written a webhook replay tool earlier. When a handler had a bug, we wanted to re-run the last 24 hours of events through the fixed code. I built this in month 4. It should have been month 1.

What this taught me

Money systems force you to think in failure modes. Every line of code is "what happens if this fails". Every API call is "what if this is a retry". Every webhook is "what if this is the second time we see it". You stop writing the happy path first - you write the failure path, then add the happy path on top.

That habit is the thing I carry into every system I build now, payment or not.

Learn more

Docs
Stripe docs - idempotent requestsStripe
Docs
Stripe docs - webhooksStripe
Docs
Stripe docs - TaxStripe
Docs
Stripe docs - 3D SecureStripe
Article
Stripe engineering blog - online migrations at scaleStripe