Healthcare-AI Case Study
case-study · claim-analyzer

Claim Analyzer

A Medicare Advantage denial-EOB triage cockpit, paired with an eval lab. Citation-grounded. Runs from the CSR seat. Counts 'refused to appeal correctly' as a real win.

Role
Architect + integrator
Status
Live
Started
2026-05
URL
claims.jakerosow.com ↗
Source
Private
// cockpit
left → middle → right — streamed in one pass from the same model call.
EOB parsed
patient
MEM-49217
provider
ROSOW MED
service
2026-04-14
charge
$1,247.00
allowed
$0.00
denial codes
  • CARC 50non-covered services
  • RARC N115local coverage determination
TRIAGE streaming
class
medical-necessity
appeal
YES
action
file standard appeal
confidence
0.91
citations
  • §42 CFR §422.566(b)
  • §CARC 50 — X12.org
streaming reasoning trace
ARTIFACT appeal letter · draft
re Standard Appeal
member MEM-49217
service 2026-04-14
claim CLM-8841726

Dear member,

This letter is your standard appeal request for the denial dated 2026-04-14, in which service code 99213 was returned under CARC 50 / RARC N115 as non-covered.

Under 42 CFR §422.566(b), Medicare Advantage organizations must reconsider an adverse organization determination upon timely request. We are filing this appeal within the 60-day window…

A Medicare Advantage denial-EOB triage cockpit, paired with an eval lab. A member-services CSR pastes the denial and gets back a triage: what kind of denial it is, whether it’s appealable, the recommended next action, and a reasoning trace tied to public citations. Alongside that, a draft member letter: an appeal when one is warranted, an explanation when it isn’t.

Why it exists

Every healthcare-AI hiring conversation starts with “what have you built in our space?” Without something shipped, the answer is just resume narrative.

Claim Analyzer is the artifact I send instead. One demo URL I can drop into a cold email, built around a workflow I run every day as a Customer Care Associate at Health First Health Plans. The simulated user is a CSR whose job I actually do. The visitor is the hiring manager looking at the result.

What made it hard

“ChatGPT with an EOB textarea” is the lazy version of this, and every healthcare-AI hiring manager has already seen it. The bar I held myself to was workflow product, not LLM toy.

Citations had to be real. Every claim about appeal rights points back at a public source: the CARC/RARC dictionary or 42 CFR §422 Subpart M. No proprietary criteria sets. HIPAA-safe by design, with synthetic data only and identifier-format checks running in CI.

// denial taxonomy
Both halves are scored. Correctly refusing to appeal counts as correct.
appealable 6
  • medical-necessity
  • prior-auth-missing
  • coding-error
  • out-of-network
  • experimental / investigational
  • step-therapy
non-appealable 2
  • duplicate-claim
  • cob-second-payer
correctly refusing to appeal scores as correct
// eval lab
n=18 scenarios · judge: Claude Opus 4.7 · target ≥ 90%.
variant appeal · correct refuse · correct overall
v1 chain-of-thought —/10 —/8
v2 citation-first —/10 —/8
pre-launch — awaiting first full run ≥ 9/10 ≥ 7/8 ≥ 90%
// pragmatic decisions

Three trade-offs worth naming.

Every choice below was the cheap one. Each has a real cost. Listing both halves on purpose.

01

CSR cockpit, not a member-direct chatbot

Chose

The simulated user is a member-services CSR. The visitor (usually a healthcare-AI hiring manager) sits in the CSR's seat. The right-pane artifact is what a CSR would actually send to a member.

Why

It's the workflow I run every day at Health First Health Plans, a Florida MA carrier. Putting the visitor in a different role from the real user is also the part that signals product taste, not just LLM plumbing.

Cost

A little extra friction on first landing while the visitor figures out 'oh, I'm the CSR right now.' The cockpit layout and a curated scenario gallery do most of the work to fix that.

02

Non-appealable denials are first-class, and the eval scores them

Chose

Taxonomy is 5–6 appealable categories plus 2 explicitly non-appealable ones (duplicate claim, COB second-payer). 'Correctly refused to appeal' counts the same as 'correctly appealed.'

Why

Refusing to appeal when an appeal is wrong is the most useful thing this tool can do, and it's the part that says I understand the workflow, not just the API.

Cost

The eval has to score multi-field triage instead of a single yes/no. Ground-truth labels for non-appealable scenarios can't be trusted to an LLM, so I hand-author and hand-label every one.

03

Direct @anthropic-ai/sdk over Vercel AI SDK, so cache hit rate is real

Chose

Direct @anthropic-ai/sdk with explicit cache_control breakpoints. Routes pinned to runtime: 'nodejs'.

Why

Cache hit rate is the methodology signal I most want to land. It's the visible version of 'I cared enough to instrument it.' The Vercel AI SDK's array-system limitation (vercel/ai #13308) blocks fine-grained cache_control placement, which would erase that signal.

Cost

I write my own stream-buffer plumbing and make the Edge-vs-Node calls myself, instead of leaning on an off-the-shelf abstraction. More surface area to test.

// the stack

What it’s built on.

Up next
Cost Specialist — A Claude project that audits Claude API spend