Clinical AI · HIPAA

Clinical Co-pilot

2025 · Academic medicine

An evaluation-first clinical co-pilot. Every answer is sourced, scored, and traceable to a citation set. Deployed inside two academic medical centers, used daily.

11s

Avg. answer time

0.94

F1 vs. attending

100%

Citation traced

01 / Problem

The state of play.

The founders were attending physicians, and they had a clear constraint: a clinical co-pilot that could not be trusted to cite its sources was worse than no co-pilot at all. The first prototypes worked beautifully in a demo and badly in front of an attending. The team needed an evaluation harness before they needed a better model.

02 / Approach

What we built.

We started with the eval set, written by attending physicians at the two academic medical centers piloting the system — two hundred questions, every answer scored against a known reference. We built the retrieval and citation layer against the eval, then a LangGraph agent on top, then the model, in that order. SOC2 and HIPAA controls were wired in from the first commit.

PythonLangGraphpgvectorHIPAASOC2

03 / Outcome

What shipped.

Eleven-second average answer time, end to end. F1 of 0.94 against attending consensus on the eval. One hundred percent of answers carry a traceable citation set. The system is in daily clinical use across two academic medical centers, and the eval runs on every commit.

They redesigned a category. Our board now uses our app as the reference for what good feels like.
— Founder · Clinical AI startup

Engagement

11 months · in active development

Team

Lead engineer
AI systems engineer
Compliance engineer
Design lead

← Back to work