87 types of UX interviews. Each one needs its own analysis prompt and a sample JSON output. I thought I’d knock them out with a template over a couple of evenings - turns out without a system it’s 174 files of manual work stretching over three weeks. In 4 hours with Claude Code subagents I built a pipeline that generates them on its own.
Context: I’m building a platform for AI-powered analysis of UX research. Discovery, usability, JTBD, exit interviews - 87 subtypes. Each subtype is analyzed differently. Each one needs a prompt + sample output with computed metrics like satisfaction_score, feature_demand_score, usability_score.
Sat down with Claude Opus 4.6 at 8pm. Got up at midnight with a working pipeline.
Phase 1: JSON Output Structure (~1 hour)
The first hour went into agreeing on the format. Boring work, but one broken format multiplies into 87 broken outputs.
7 open questions resolved through AskUserQuestion with a visual preview right in the terminal. Key forks:
- Single
sections[]array instead of two separate ones - easier to parse on the backend - Flat topics + hierarchical mindmap with IDs - both search and visualization covered
confidence: 0-1required on every block - noise filtering- Computed metrics: envelope + polymorphic scores - shared wrapper, but each interview type adds its own metrics inside
9 architectural decisions documented. Sounds bureaucratic. But when a subagent on the 80th prompt asks “what’s the confidence format?” - the answer is already in _common/schema.md.
Phase 2: Mass Generation Architecture (~30 min)
The main decision: one backend assistant, 87 swappable prompts. One JSON schema for all. Only the prompt changes. I went with a unified schema even though some interview types (say, diary study vs. card sorting) differ significantly - but at least the backend doesn’t turn into a zoo of parsers.
Folder structure:
prompts/
├── _common/ # Shared files
│ ├── schema.md # JSON schema
│ └── metrics.md # Metric formulas
├── discovery/
│ └── general/
│ ├── prompt.md # Analysis prompt
│ └── example.json # Output example
├── usability/
│ └── moderated/
│ ├── prompt.md
│ └── example.json
└── ... (85 subtypes)
Phase 3: Generator Subagent
Created .claude/agents/insights-prompt-agent.md - an autonomous agent on Opus. Each instance gets one subtype and produces two files: a prompt + a sample JSON.
Key decisions in agent design:
The agent doesn’t ask questions. Everything it needs comes from _common/ and a metrics synthesis doc. With 87 runs, any question to a human is a bottleneck that kills the whole autonomy.
The metrics synthesis doc is 53KB. Reading it whole is pointless: context gets polluted, hallucinations increase. The agent greps by subtype name. The first run showed that without this constraint the agent starts mixing up metrics from neighboring subtypes.
Every example.json gets piped through python3 -m json.tool. Broken JSON gets caught immediately - not when the backend crashes on the 93rd subtype.
Orchestration: 18 batches of 5 subagents in parallel. Only the orchestrator updates the tracker - subagents don’t touch it. Otherwise race condition.
Phase 4: Metrics Research (~40 min)
Before generating 87 prompts, I needed to understand what computed metrics actually exist for UX research.
40 sources via Exa + 5 queries in Perplexity + synthesis in Claude. Result - a 53KB document classifying metrics by interview type.
Found 10 new metric types I hadn’t planned:
satisfaction_trend- satisfaction dynamics across interviewsfeature_advocacy- how willing the user is to recommend a featuretask_completion_confidence- confidence in task completion
Pattern: envelope + discriminated union. A shared wrapper with metric_type, value, confidence. Inside - specific fields for each metric type.
Confidence thresholds: >=0.70 show it, 0.50-0.70 soft warning, <0.50 suppress it. Based on Gainsight’s health score practices, adapted for UX context.
Phase 5: Agent Review via /btw - 8 Bugs With Zero Effort
A trick that levels up any Claude Code session. The /btw command launches a parallel agent right from the terminal - it works in the background while you’re busy with the main task. I asked it to thoroughly analyze all created files and find errors and inconsistencies. A couple of minutes later I got a report, copied it into the main terminal - and Claude fixed everything itself. 8 issues:
- Interactive skill in an autonomous agent - the skill asks questions, the agent is autonomous. Removed it, baked the logic straight into the prompt
- Race condition on the tracker - subagents were updating the tracker in parallel. Moved updates to the orchestrator
- No
mkdir -p- the agent tried writing to nonexistent directories - Reading 53KB in full - replaced with grep by subtype
- No JSON validation - added
python3 -m json.tool - First subtype specifics in the shared schema - generalized
- No error handling - added FAILED status + retry list
- No low-confidence example - added an example with
confidence: 0.48
Three of the eight are potentially critical at scale. The tracker race condition would have broken half the batches. Reading 53KB in each of 87 agents is roughly 4.5MB of extra context - more hallucinations and slower execution. And missing mkdir causes the agent to fail silently with no error.
Result
In 4 hours:
- 9 architectural decisions
- 25+ files
- 40+ research sources
- 1 of 87 prompts fully complete (template for the rest)
- A pipeline for mass-generating the remaining 86
One autonomous subagent + an orchestrator with a tracker = scaling without losing quality. Writing one good prompt is half the battle. Designing a system that produces 87 with predictable quality - that’s where architecture matters.
And one more thing: the parallel review via /btw caught 8 bugs the main agent missed. I often forget to run it - but every time I do, something critical shows up.
Sources
- Anthropic: Claude Code Subagents
- Claude Code Multi-Agents and Subagents Guide - TURION.AI
- How to Use Claude Code Sub-Agents for Parallel Work - Tim Dietrich
- CascadeAgent: Prompt Engineering at Scale - ICLR 2026
- Orchestrated multi-agents sustain accuracy under clinical-scale workloads - Nature
- Agent Orchestration Patterns - gurusup.com
- Gainsight: Customer Health Scores
- Automated Qualitative Coding - UserCall