Clinical AI · HIPAA

Clinical Co-pilot

2025 · Academic medicine

An evaluation-first clinical co-pilot. Every answer is sourced, scored, and traceable to a citation set. Deployed inside two academic medical centers, used daily.

11s
Avg. answer time
0.94
F1 vs. attending
100%
Citation traced
01 / Problem

The state of play.

The founders were attending physicians, and they had a clear constraint: a clinical co-pilot that could not be trusted to cite its sources was worse than no co-pilot at all. The first prototypes worked beautifully in a demo and badly in front of an attending. The team needed an evaluation harness before they needed a better model.

02 / Approach

What we built.

We started with the eval set, written by attending physicians at the two academic medical centers piloting the system — two hundred questions, every answer scored against a known reference. We built the retrieval and citation layer against the eval, then a LangGraph agent on top, then the model, in that order. SOC2 and HIPAA controls were wired in from the first commit.

PythonLangGraphpgvectorHIPAASOC2
03 / Outcome

What shipped.

Eleven-second average answer time, end to end. F1 of 0.94 against attending consensus on the eval. One hundred percent of answers carry a traceable citation set. The system is in daily clinical use across two academic medical centers, and the eval runs on every commit.

They redesigned a category. Our board now uses our app as the reference for what good feels like.
Founder · Clinical AI startup
Engagement
11 months · in active development
Team
  • Lead engineer
  • AI systems engineer
  • Compliance engineer
  • Design lead
← Back to work