Healthcare Support Specialist
A drop-in agent workspace for health-plan CSRs. Five staged contracts, a verbatim quoting rule, three explicit output branches, no orchestrator binary. The folder structure IS the pipeline.
- 01 intake normalize · entities · clarify-flag reads 00_question · glossary
- 02 classify intent: benefits / claims / PA / OOS reads intent_taxonomy · rep_persona
- 03 route intent + entities → KB section IDs reads _index.md
- 04 extract verbatim quotes · no paraphrase reads acme_kb/* sections
- 05 compose rep-facing answer · 3 branches reads voice_guide · rep_persona
A drop-in agent workspace for member-services CSRs at a health plan. A rep types a member’s question; the workspace walks five staged contracts and replies with a KB section pointer, the verbatim passage, and a suggested talk-track. The folder structure is the pipeline; there’s no orchestrator binary and no test framework. The deliverable is the workspace itself.
Why it exists
Member-services reps spend their day translating a benefit grid, a denial code, or a prior-auth policy into a sentence the member on the phone can actually use. The KB is already written. The hard part is the routing: figuring out which section answers which question, and saying it back without paraphrasing the part that matters.
I work that seat at Health First Health Plans. The Healthcare Support Specialist is the workspace I’d want at it: a CSR types a question, an agentic harness walks five staged contracts, and the rep gets back the section ID, the verbatim quote, and a phrasing they can actually deliver. The fictional payer (Acme Health Plan) keeps the artifact portfolio-safe with no PHI and no proprietary content, but the shape mirrors a real CS workflow.
What made it hard
The lazy version is “one big system prompt that says here is a knowledge base,
answer the question.” That works until the model paraphrases
after deductible as once you’ve met your deductible on a
benefit a member is going to be billed for, or invents a denial code that doesn’t
exist in the WPC list, or confidently answers a network question the workspace
was never scoped to handle in the first place.
The bar I held myself to was read-first auditability. A reviewer should be able to read three files in five minutes and know exactly what the pipeline does and where it gets each fact. Stages live in separate folders with separate contracts, quotes are always verbatim with a section ID attached, and every output picks one of three explicit branches without an implicit fallthrough. The seven worked example runs double as fixtures and as conformance tests.
Five layers, on purpose.
The structure follows Singer’s Interpretable Context Methodology: identity, shared resources, stage contracts, reference material, and per-run artifacts each live in their own layer. The first four are stable, the kind of thing a reviewer reads once. Only Layer 4 changes per question.
- Workspace identity one file orients any reader/ 00_workspace.md
- Shared resources stable per-stage contextshared/ intent_taxonomy.md · glossary.md · rep_persona.md · voice_guide.md
- Stage contracts 5 numbered folders · same 5-section schema01_intake/ … 05_compose/ contract.md · Purpose / Inputs / Process / Outputs / Failure modes
- Reference material the KB itself · canonical section IDsreference/acme_kb/ _index.md · benefits/ · claims/ · prior_auth/
- Per-run artifacts one directory per question · full stage chainruns/<run_id>/ 00_question · 01_intake … 05_compose-answer · _audit.md
Three trade-offs worth naming.
Each call below was the cheap one. Each has a real cost. Listing both halves on purpose.
The pipeline IS the folder structure
Five numbered stage directories (01_intake → 05_compose), each with a hand-authored contract.md on the same five-section schema (Purpose / Inputs / Process / Outputs / Failure modes). No orchestrator binary. AGENTS.md is the runbook, and any agentic CLI that auto-loads project instructions can run the pipeline against any committed run.
An ICM workspace is meant to be read first, run second. A reviewer can audit the whole pipeline by reading three files in five minutes, and a Claude Code or Codex session can re-run any stage on any committed run with zero setup. The structure is the spec; there's no second place where the pipeline lives.
Convention is the only enforcement. The runtime can't tell you a stage skipped a section of its Outputs schema; the next stage just reads what's there and returns something weaker. Retries and parallelism aren't built either, because they weren't worth it for a portfolio artifact. The eval surface is the seven example runs and a reviewer's eye.
Verbatim quoting at 04_extract, paraphrasing only at 05_compose
Stage 04 quotes KB passages verbatim with section IDs and never paraphrases. Only 05 is allowed to translate that quote into rep-friendly talk-track language, and a separate voice_guide.md hands it three substantive guards: preserve 'after deductible', name the network qualifier, name the criteria gate.
A benefit grid says what it says. Paraphrasing 'after deductible' as 'once you've met your deductible' reads friendlier and is a different statement to a regulator, or to a member who's later told they owe more than they planned for. Splitting extract from compose puts the source-of-truth quote on one page and the rep's talking line on the next, so each half is auditable on its own.
Two stages and two files for what could be one LLM call. More tool overhead per question, and one more place a stage author can drift the contract. The voice guards are guidance, not a regex; the compose stage has to actually read them, and a reviewer has to catch the drift if it doesn't.
Three output branches: normal · out-of-scope · needs-clarification
Every 05_compose output picks exactly one branch. Normal returns KB pointer + quote + talk-track. Out-of-scope returns a warm-transfer talk-track to the right team (provider services, eligibility, billing). Needs-clarification returns a question for the rep to ask the member, not a guess at the answer.
On a CSR floor, 'I don't know yet, go ask' is a more useful answer than a confident wrong one. Same energy as the Claim Analyzer counting 'correctly refused to appeal' as a win. The branches that aren't the happy path are the part of the pipeline I most want a reviewer to look at. They refuse cleanly when the inputs don't add up, and they hand the rep the next concrete move.
Compose carries three templates instead of one, and the warm-transfer phrasing has to stay in sync with the actual team boundaries at whichever payer adopts it (Acme's won't match). The seven example runs cover one needs-clarification and one out-of-scope path. Adding a fourth branch later means a fourth template plus new fixtures to back it.
What it’s built on.
- Markdown spec
- ICM methodology
- AGENTS.md runbook
- CLAUDE.md pointer
- Claude Code · drop-in
- Codex · Cursor · Aider compatible
- Anthropic prompt cache · warm-up
- Mermaid diagrams
- 5-stage pipeline
- 7 worked example runs
- Hand-authored Acme KB
- CMS SBC template
- WPC CARC/RARC
- CMS PA-policy guidance
- GitHub
- Claude Code