Healthcare-AI Case Study

case-study · claim-analyzer

Claim Analyzer

A Medicare Advantage denial-EOB triage cockpit, paired with an eval lab. Citation-grounded. Runs from the CSR seat. Counts 'refused to appeal correctly' as a real win.

Role: Architect + integrator
Status: Live
Started: 2026-05
URL: claims.jakerosow.com ↗
Source: Private

// cockpit

left → middle → right — streamed in one pass from the same model call.

EOB parsed

patient: MEM-49217
provider: ROSOW MED
service: 2026-04-14
charge: $1,247.00
allowed: $0.00

denial codes

CARC 50non-covered services
RARC N115local coverage determination

TRIAGE streaming

class: medical-necessity
appeal: YES
action: file standard appeal
confidence: 0.91

citations

§42 CFR §422.566(b)
§CARC 50 — X12.org

streaming reasoning trace

ARTIFACT appeal letter · draft

re Standard Appeal

member MEM-49217

service 2026-04-14

claim CLM-8841726

Dear member,

This letter is your standard appeal request for the denial dated 2026-04-14, in which service code 99213 was returned under CARC 50 / RARC N115 as non-covered.

Under 42 CFR §422.566(b), Medicare Advantage organizations must reconsider an adverse organization determination upon timely request. We are filing this appeal within the 60-day window…

A Medicare Advantage denial-EOB triage cockpit, paired with an eval lab. A member-services CSR pastes the denial and gets back a triage: what kind of denial it is, whether it’s appealable, the recommended next action, and a reasoning trace tied to public citations. Alongside that, a draft member letter: an appeal when one is warranted, an explanation when it isn’t.

Why it exists

Every healthcare-AI hiring conversation starts with “what have you built in our space?” Without something shipped, the answer is just resume narrative.

Claim Analyzer is the artifact I send instead. One demo URL I can drop into a cold email, built around a workflow I run every day as a Customer Care Associate at Health First Health Plans. The simulated user is a CSR whose job I actually do. The visitor is the hiring manager looking at the result.

What made it hard

“ChatGPT with an EOB textarea” is the lazy version of this, and every healthcare-AI hiring manager has already seen it. The bar I held myself to was workflow product, not LLM toy.

Citations had to be real. Every claim about appeal rights points back at a public source: the CARC/RARC dictionary or 42 CFR §422 Subpart M. No proprietary criteria sets. HIPAA-safe by design, with synthetic data only and identifier-format checks running in CI.

// denial taxonomy

Both halves are scored. Correctly refusing to appeal counts as correct.

appealable 6

medical-necessity
prior-auth-missing
coding-error
out-of-network
experimental / investigational
step-therapy

non-appealable 2

duplicate-claim
cob-second-payer

// eval lab

n=18 scenarios · judge: Claude Opus 4.7 · target ≥ 90%.

variant appeal · correct refuse · correct overall

v1 chain-of-thought —/10 —/8 —

v2 citation-first —/10 —/8 —

pre-launch — awaiting first full run ≥ 9/10 ≥ 7/8 ≥ 90%

// pragmatic decisions

Three trade-offs worth naming.

Every choice below was the cheap one. Each has a real cost. Listing both halves on purpose.

CSR cockpit, not a member-direct chatbot

Chose

The simulated user is a member-services CSR. The visitor (usually a healthcare-AI hiring manager) sits in the CSR's seat. The right-pane artifact is what a CSR would actually send to a member.

Why

It's the workflow I run every day at Health First Health Plans, a Florida MA carrier. Putting the visitor in a different role from the real user is also the part that signals product taste, not just LLM plumbing.

Cost

A little extra friction on first landing while the visitor figures out 'oh, I'm the CSR right now.' The cockpit layout and a curated scenario gallery do most of the work to fix that.

Non-appealable denials are first-class, and the eval scores them

Chose

Taxonomy is 5–6 appealable categories plus 2 explicitly non-appealable ones (duplicate claim, COB second-payer). 'Correctly refused to appeal' counts the same as 'correctly appealed.'

Why

Refusing to appeal when an appeal is wrong is the most useful thing this tool can do, and it's the part that says I understand the workflow, not just the API.

Cost

The eval has to score multi-field triage instead of a single yes/no. Ground-truth labels for non-appealable scenarios can't be trusted to an LLM, so I hand-author and hand-label every one.

Direct @anthropic-ai/sdk over Vercel AI SDK, so cache hit rate is real

Chose

Direct @anthropic-ai/sdk with explicit cache_control breakpoints. Routes pinned to runtime: 'nodejs'.

Why

Cache hit rate is the methodology signal I most want to land. It's the visible version of 'I cared enough to instrument it.' The Vercel AI SDK's array-system limitation (vercel/ai #13308) blocks fine-grained cache_control placement, which would erase that signal.

Cost

I write my own stream-buffer plumbing and make the Edge-vs-Node calls myself, instead of leaning on an off-the-shelf abstraction. More surface area to test.

// the stack

What it’s built on.

Next.js 15 App Router
TypeScript 5
Tailwind 4
shadcn/ui
@anthropic-ai/sdk
Claude Sonnet 4.6
Claude Opus 4.7 · judge
ReadableStream
@upstash/ratelimit
jose · Edge cookies
X12.org · CARC/RARC
eCFR JSON · 42 CFR §422
Vercel
Cloudflare DNS
Claude Code

✦

Up next

Cost Specialist — A Claude project that audits Claude API spend

Back

← All case studies