Latency, in milliseconds and intent

The headline number on the trading-terminal rebuild is a 94 percent cut in P99 latency. The honest version is that we did very few clever things. Mostly we removed work from the hot path, and refused to put work back without a measurement saying we had to.

A lot of performance work, in our experience, is intent more than technique. You decide the order path is sacred. You enforce the decision at every PR. You do not care that the abstraction would be tidier with a generic cache; you care that the trader sees the fill in seven milliseconds, every time, on the worst Tuesday of the year.

A small budget, written down

We give every hot path an explicit budget, in microseconds, with an owner. The budget lives next to the code. A change that breaks it does not merge — not because a tool refuses, although a tool also refuses, but because the team has agreed that the budget is a contract.

Latency is not a metric. It is a stance the team takes against entropy.

The WebGL price ladder, the Rust execution layer, the sub-millisecond order path — the fancy parts of that engagement — are the easy parts to talk about. The hard part is the year of small refusals that kept those parts from being dragged back into the average. There is no shortcut for that. There is only a team that has decided performance is something they do every day, and the discipline to keep deciding it.

Latency, in milliseconds and intent

A small budget, written down

On the cost of cleverness

Eval-first AI: a small manifesto