Why SDD
- METR study (July 2025): developers using AI were 19% slower on real tasks. Root cause - debugging loops from unstructured prompts. Confirmed in a follow-up study in February 2026
- "Why do we sometimes get slop? Because we under-specified it" - Denis Kiselyov. The cause of low-quality code is always under-specification
- Vibe coding breaks at scale. A prototype in an evening - sure. A maintainable product - no. Monolith, 0.5% test coverage, process debt - a typical outcome after "wild vibe coding"
- 30+ frameworks in the past year. The industry is searching for answers. Medium, Thoughtworks Technology Radar (Assess ring), Augment Code - everyone is tracking SDD
- The formula: Spec → Plan → Tasks → Code. Instead of "write code from a prompt" - "write a spec, and the agent implements it"
SDD Maturity Model
Level 1: Spec-First
A spec per task. Can be discarded after implementation. Quick start, minimal ceremony. Good for prototypes and one-off tasks.
Level 2: Spec-Anchored
Spec is a living document. All changes start with updating the spec. Code follows the spec, not the other way around. Works for products in active development.
Level 3: Spec-as-Source
Spec is the only artifact. Code is compiled output. Radical approach: you change the spec, the agent regenerates the code. Tessl ($125M raised) is building a platform on this principle.
Category 1: SDD Frameworks (spec - plan - code)
BMAD-METHOD ~41k stars
12+ AI roles: Analyst, PM, Architect, Scrum Master. Essentially simulates an agile team inside a single agent. Creates PRD, architecture, dev stories with full context. Node.js v20+, v6-alpha - full rewrite.
The most popular SDD framework. Powerful but complex - the learning curve is steep. For those willing to invest time in setup to get a full-fledged workflow.
"BMAD, OpenSpec, and all the rest - these are purely applied tools aimed at breaking a task into manageable pieces" - Ivan (QuintCode/NeuralStack)
GitHub Spec Kit
MIT license, by Microsoft/GitHub. Agent-agnostic - works with 8+ AI agents. Standardizes the specification format in Markdown. Not tied to a specific IDE or model.
Focuses on format, not process. Spec Kit defines how to write specs, not how to implement them. Pairs well with other frameworks.
OpenSpec
Semi-living specifications with delta markers. 20+ supported agents. Built for brownfield projects and iterative changes - when the code already exists and needs to evolve, not be written from scratch.
Key difference from Spec Kit: specs aren't static but "semi-living" - they track the delta between the current and desired state.
GSD (Get Shit Done) - lightweight framework for prototypes. Minimal ceremony, quick start. When you need to build an MVP, not an enterprise process.
PromptX - structured prompts for the SDD approach. Organizes prompts into reusable templates.
LeanSpec - lightweight alternative to BMAD and Spec Kit. For those who find full frameworks overkill, but are tired of prompt chaos.
Category 2: Reasoning & Decision Making
QuintCode + FPF ~241 stars (FPF)
Two related tools. FPF (First Principles Framework, Anatoly Levenchuk) - a "thinking operating system for LLMs." Formal, complex, powerful. QuintCode (Ivan Zakutniy) - a practical wrapper around FPF for developers.
ADI Cycle in 5 phases: Hypothesize (Abduction) - generate 3-5 competing approaches. Verify (Deduction) - check logical consistency. Test (Induction) - gather evidence. Audit - WLNK analysis, check blind spots. Decide - create a Design Rationale Record.
Solves a key problem: three months later you won't remember why you chose that approach. The decision lives in a chat thread you'll never find again. QuintCode captures reasoning in Design Rationale Records.
"Got 52 pages of documentation and about 280 feature files in Gherkin/BDD. Fed it to Claude Code - burned through a week's subscription limits overnight. The project came out pretty good, tests from the feature files actually work." Ivan (QuintCode) - on using FPF + ChatGPT Pro, 2 evenings of work
"Claude Code on its own suggested Kubernetes, EKS. QuintCode struggled, asked questions, and ultimately arrived at the cheapest but most stable solution - a single-node Docker Swarm." Ivan (QuintCode) - ADI cycle chose Docker Swarm over intuitive Kubernetes
"Learn to program, not a language. Same here - you need to learn to think." Ivan (QuintCode/NeuralStack)
"Nobody will think for you, not even FPF." Anatoly Levenchuk
SGR (Scheme Guided Reasoning) - structured output + chain of thought. A reasoning_steps field in the JSON response improves accuracy of non-reasoning models. Debuggable output.
Category 3: Autonomous Loops
Ralph Loop / Smart Ralph
Autonomous AI agent: runs a coding tool (Claude Code / Codex) again and again until every item in the PRD is done. Each iteration gets fresh context. Simple principle, sometimes surprisingly effective.
Smart Ralph adds an SDD workflow on top: asks clarifying questions before generation, structures the process.
Critics call it "monkey coding" - just pressing the button until it works. It handles certain tasks well, but without a spec you can easily end up in an infinite loop.
"Letting an agent loose on production is how you become a headline" - Denis Kiselyov
Taskmaster AI - task management drop-in for Cursor, Lovable, Windsurf, Roo. PRD → tasks → implementation. Works with any AI chat. GitHub
Category 4: Session/Memory Management
Claude-Flow ~21k stars
Enterprise-grade AI agent orchestration platform. Multi-agent swarms, autonomous workflow coordination, distributed swarm intelligence, RAG integration, 100+ MCP tools. Native Claude Code and Codex integration.
Tested personally for autonomous MVP and POC development. It works - agents execute 80-90% of tasks correctly. The remaining 10-20% are minor bugs that surface during testing. Fixable in a couple of additional sessions, but don't expect fully autonomous bug-free builds just yet.
Vatsal Shah's Beginner's Guide is a great starting point - used it myself, highly recommend.
cc-sdd ~2,880 stars
Kiro-style SDD commands for 8 tools: Claude Code, Cursor, Gemini CLI, Codex CLI, GitHub Copilot, Qwen Code, OpenCode, Windsurf. "Stop losing 70% of development context."
Enforced workflow: Requirements - Design - Tasks. Won't let you skip planning and jump straight into code. Structured AI-DLC (AI-Driven Development Lifecycle).
Memory Bank / MBB (Denis Kiselyov) - project memory using the C4 model. Memory Bank Bible - seed principles for project structure. Progressive disclosure - gradually revealing context to the agent. More on YouTube
Category 5: Platforms & Orchestrators
Intent (Augment Code) - living spec platform. Bidirectional specs: code and spec stay in sync. Coordinator + specialists. $60/mo. augmentcode.com
Kiro (AWS) - static spec with EARS notation. Claude-focused via Amazon Bedrock (Claude 3.7 and 4.0, more models planned). Free (50 credits/mo). For AWS greenfield projects.
Zenflow (Zencoder) - PRD-to-code orchestrator. PRD - spec review - plan - phased implementation via Claude + Codex.
Compyle (YC F25) - "Lovable for engineers" with an SDD approach. Asks clarifying questions before code generation. Claude Code under the hood. Multi-repo support, free.
Tessl ($125M) - radical spec-as-source approach. Spec = the only artifact, code is generated automatically. The most ambitious bet in the SDD space.
Devika - open-source autonomous SE agent. An alternative to Devin. Full cycle: planning - research - coding. Web interface. GitHub
Category 6: Patterns & Techniques
AI-TDD - iteratively running tests until they pass. Caveat: LLMs may cheat (stubs instead of real code).
Metaprompting - an LLM iteratively improves a prompt through a TDD cycle. It's not the code that improves - it's the prompt.
Context Engineering - the key discipline: preparing context for AI agents. MD files as a navigation layer, dependency graphs.
AI-Ready Codebase - adapting legacy code for AI agents. Hierarchy of MD files, minimal documentation set, grounding the agent on existing code.
Reflection Pattern - the agent double-checks itself at each step, finds errors, restarts the block.
Judge Pattern - agent + judge sub-agents with iterative improvement. Up to 5 judges evaluate the result.
Comparison Table: Tier-1 Frameworks
| Framework | Focus | Complexity | IDE | Memory | Best for |
|---|---|---|---|---|---|
| BMAD-METHOD | Agile team of AI roles | High | Any | Docs-as-code | Teams, enterprise |
| Spec Kit | Spec format (standard) | Medium | Any | None | Cross-agent teams |
| OpenSpec | Brownfield, iterative specs | Medium | Any | Delta markers | Existing projects |
| QuintCode/FPF | Reasoning, decision rationale | High | Claude Code, Cursor | .fpf/context.md | Architects |
| Ralph/Smart Ralph | Autonomous loop to completion | Low | Claude Code, Codex | Fresh context | Automation |
| Claude-Flow | Agent orchestration, swarms | Medium | Claude Code, Codex | Persistent memory | MVP/POC, solo & teams |
| cc-sdd | AI-DLC lifecycle (8 IDEs) | Medium | 8 tools | Structured | Developers |
Expert Quotes
"Hallucinations aren't a bug - they're an architectural feature: the model fills in gaps when you don't set boundaries" Denis Kiselyov, DEKSDEN
"BMAD, OpenSpec, and all the rest are purely applied tools for breaking tasks into pieces. First FPF, then OpenSpec, BMAD, and so on" Ivan (QuintCode/NeuralStack)
"Learn to program, not a language. Same thing here - you need to learn to think" Ivan (QuintCode/NeuralStack)
"Stakhanovism 2026: there are plenty of machines (agents), what we need are quality supervisors" Anatoly Levenchuk
"Native language for specifications: precision of wording matters more than saving tokens" Rodion Mostovoy
"Vibe coding is neither magic nor garbage - the truth is in the middle" Pavel Molyanov
"Humans hallucinate no worse than LLMs - we also approximate and never question our own adequacy" Ivan (QuintCode/NeuralStack)
Sources
YouTube (AI-Driven Development)
- Real Spec-Driven Dev / FPF - Ivan (QuintCode), Rodion Mostovoy, Levenchuk (March 16, 2026)
- Agentic Engineering AI Workflow Part 2 - Denis Kiselyov, Rodion Mostovoy (March 13, 2026)
- Agentic Engineering AI Workflow Part 1 - Denis Kiselyov (March 6, 2026)
Articles
- Spec-Driven Development Is Eating Software Engineering: A Map of 30+ Frameworks - Vishal Mysore, Medium (March 2026)
- 6 Best SDD Tools for AI Coding - Augment Code
- Thoughtworks Technology Radar: SDD - Assess ring
- SDD is Waterfall in Markdown - a critical perspective on SDD
- Problems with Spec-Driven Development - Sibylline
- METR: Early 2025 AI-Experienced OS Dev Study - AI effectiveness research
- arXiv: SDD paper
- Claude-Flow Beginner's Guide - Vatsal Shah, step-by-step setup guide
Telegram Channels
- @ai_driven - Rodion Mostovoy, AI-Driven Development
- @deksden - Denis Kiselyov, DEKSDEN
- QuintCode / NeuralStack - Ivan Zakutniy
GitHub Repositories
- BMAD-METHOD ~41k stars
- Spec Kit - Microsoft/GitHub
- OpenSpec
- QuintCode
- FPF ~241 stars
- cc-sdd ~2,880 stars
- Claude-Flow ~21k stars
- Ralph
- Taskmaster AI
- Devika
More on AI-driven development, spec-driven workflows, and practical experiments:
Telegram: @prodfeatai