Engagement · 2026 Q1Platform · Postgres · 6 weeks[example]

Cutting p95 checkout latency to 340 ms

The team was sure the database was the problem. It was not. The trace said so in the first afternoon, and the fix touched four lines of application code and one index.

Client: [example]
Team: Two from Kernwise, four from the platform team
Outcome: p95 1.4 s → 340 ms
Engagement: Six weeks · fixed scope

The checkout worked. Orders went through, money moved, nothing was on fire. But the last step — the one between the customer pressing pay and the confirmation screen — took 1.4 seconds at p95, and on mobile, over a real network, that was long enough that a measurable fraction of customers tapped twice or gave up. The team had spent two sprints adding read replicas and tuning Postgres, and p95 had moved from 1.5 seconds to 1.4. They brought us in to make the database faster.

We did not make the database faster. The database was not the problem.

The problem as we found it

The first thing we asked for was a trace. Not an average, not a dashboard — one real checkout request, captured end to end, with every span. They had the instrumentation; they had simply never looked at a single request, only at aggregates. The aggregate said "checkout is slow." The single request said exactly where.

Four spans dominated. Auth was 14 milliseconds, fine. The payment submit was 70 milliseconds, fine. Between them sat a cart re-fetch at 820 milliseconds and a tax computation at 1,100 milliseconds that overlapped it. Together those two spans were 93 percent of the wall-clock time, and neither of them touched the primary database in the way the team assumed.

The constraint we worked under

This was a fixed-scope engagement: six weeks, no new services, no schema migration that required downtime. The checkout could not regress in correctness — a single wrong total is worse than a slow one — and the team needed to own the change after we left, so nothing clever that only we could maintain.

The approach

The trace made the order of work obvious: fix the two slow spans, in the order of how much they cost, and measure after each change rather than batching them.

Fig 1The captured checkout trace. Two spans — cart fetch and tax compute — account for 93 percent of the 2.1-second worst case.

The changes that shipped

1 — Collapse the cart re-fetch

The cart was read once to render the page and re-read at submit "to be safe." The re-read pulled every line item with a correlated subquery per row. It was safety theatre: the cart had not changed between render and submit in any real flow. We removed the re-fetch and trusted the cart the client already held, validating only the total server-side.

-- on submit, per line item:
- SELECT *, (SELECT price FROM prices WHERE sku = li.sku
-            ORDER BY effective_at DESC LIMIT 1) AS unit_price
- FROM line_items li WHERE li.cart_id = $1;
+ SELECT total_cents FROM carts WHERE id = $1;  -- validate the held total only

That one change took the cart span from 820 milliseconds to 24.

2 — Precompute the tax bracket

Tax was computed by walking every jurisdiction rule at request time. The rules changed roughly monthly; the request path recomputed them on every checkout. We moved the bracket lookup behind a table that a nightly job refreshes, so the request reads one row instead of evaluating a ruleset. The application change was a single call site swapped to a taxBracketFor(zip) lookup.

3 — Add the index the re-fetch had been hiding

With the re-fetch gone, the remaining cart read had no supporting index because the old query had masked the need for one. A single composite index on carts (id, updated_at) took the last 40 milliseconds off.

4 — Stop awaiting the analytics write

The confirmation response awaited a write to the analytics pipeline that the customer never sees. We moved it off the response path so the confirmation returns as soon as the order is durable.

The outcome

Each change was measured on its own before the next went in, so the attribution is real, not a guess.

Metric	Before	After	Δ
p50 checkout	910 ms	180 ms	−80%
p95 checkout	1,400 ms	340 ms	−76%
p99 checkout	2,100 ms	520 ms	−75%
Net new services	—	0	—
Net new dependencies	—	0	—

The fastest request is the one you do not make. Most of this engagement was deleting work, not adding it.

— Kernwise · Engineering notes 031

What we took from it

The lesson was not about Postgres, and it was not about checkout.1 It was that the team had been optimising the thing they could see on the dashboard rather than the thing a single trace would have shown them in an afternoon.

Capture one real request before you tune anything. The aggregate tells you there is a problem; the trace tells you where.
The slowest span is rarely where the team's attention already is.2
Deleting work beats speeding it up. Two of the four changes removed code.
Measure after each change, not after the batch, or you will never know which one mattered.

Footnotes

The same shape — a re-fetch "to be safe" dominating a request — has shown up in three of our last six engagements. It is almost a pattern. ↑
If it were, the team would have already fixed it. The valuable findings are by definition in the places no one has been looking. ↑