
$0.90 a month instead of $240 a year. Local transcription on Apple Silicon in 2.5 evenings.
I tried a bunch of SaaS transcription services. Otter.ai - $16.99/month. Fireflies Pro - $18/month. Notta Pro - $8.17/month. All of them have free plan limits and require sending audio to the cloud.
Then I saw PLAUD. A sleek AI voice recorder for $159-179. Pro subscription - ~$100/year for 1,200 minutes per month. Unlimited - ~$240/year. Or $30/month if you pay monthly.
I started doing the math. Realized I could run it locally.
What I built
TranscribeFlow - an open-source transcriber for Apple Silicon M1-M4.
Stack:
- MLX Whisper - transcription directly on the GPU. Free. Chose it over the OpenAI API because I wanted local. On M3 Pro, 10 minutes of audio takes about a minute and a half to process. M4 Pro - 50 seconds. Even on M1 with 8 GB - around three minutes, which matters a lot for NDA meetings.
- Pyannote - speaker diarization. 9,000 stars on GitHub, MIT license. In September 2025 they released the community-1 model with improved speaker counting. Also free.
- LLM post-processing - Gemini Flash cleans up the text, adds punctuation, suggests speaker names based on conversation context.
- AI insights with templates - IT meeting, business call, interview, brainstorm. Each template extracts what matters: decisions, action items, risks.
- Compare-view - transcript before and after processing side by side. You see exactly what the LLM changed.
- Mindmap - key decisions as a map. Handy for retrospectives.
There’s a fallback to cloud engines: ElevenLabs, Deepgram, AssemblyAI. Haven’t needed it yet.
92% of the code was written autonomously by Claude Code agents. I set the direction, reviewed the output, and corrected architectural decisions.
Build vs Buy - when to build?
When I started doing the math, everyone around me said - just buy SaaS, don’t make it complicated. Market data backs this up: companies spend hundreds of thousands on custom solutions instead of buying ready-made ones.
Fair. For enterprise. For 50-person teams with budgets and deadlines.
My situation is different. My evenings cost $0. M3 Pro is already paid for. Claude Code subscription - $20/month, but I use it for a dozen other projects anyway.
“Build vs Buy” is not a binary choice. Transcription is free locally. AI processing - 6 cents per hour of audio. And I built a usable interface in 2.5 evenings. That’s the whole math.
The first evening I burned on Pyannote. Two hours on the timer - tracking down why it wouldn’t start on MPS (Metal Performance Shaders). Turned out you need a specific version of PyTorch. Pain. But I figured it out.
Economics
I calculate based on 15 hours of meetings per month - my actual volume.
| TranscribeFlow | SaaS subscription | |
|---|---|---|
| Transcription | $0 (local) | Included |
| Speaker diarization | $0 (Hugging Face) | Included |
| LLM processing + insights | ~$0.90 (Gemini Flash) | Included |
| Total/month | ~$0.90 | $8-30 |
| Total/year | ~$11 | $100-360 |
PLAUD separately: device $159-179 + subscription $100-240/year. Over two years - $400-660. My transcriber over two years - $22.
Not sure yet that TranscribeFlow covers 100% of cases. Live meetings with bad microphones - haven’t tested. Zoom recordings - works great. Voice memos - also fine.
What I learned
“Build vs Buy” in 2026 is a different question than five years ago. Back then “build” meant months of development and a team of engineers. Now Claude Code agents write 92% of the code, and I handle architecture and review. Showed what the process looks like at a vibe-coding workshop.
Market price of transcription - from $0.10 to $4.00 per minute. Locally on Apple Silicon - $0.001 per minute (LLM processing only). That’s a 100-4,000x difference.
Chose open-source for control. And for the savings, not going to lie. Meeting audio doesn’t go to the cloud. Insight templates are mine. Want to add a new format - takes 10 minutes.
The downside: no mobile app, no cross-device sync, no polished onboarding. If you need polish - go SaaS. If you want control and zero variable costs - build it yourself.
TranscribeFlow GitHub (opensource)