Build vs Buy: I built a transcriber in 2.5 evenings instead of paying for a subscription

Sergey Golubev 2026-02-09 4 min read
🌐 Читать на русском

Local transcriber on Apple Silicon

$0.90 a month instead of $240 a year. Local transcription on Apple Silicon in 2.5 evenings.

I tried a bunch of SaaS transcription services. Otter.ai - $16.99/month. Fireflies Pro - $18/month. Notta Pro - $8.17/month. All of them have free plan limits and require sending audio to the cloud.

Then I saw PLAUD. A sleek AI voice recorder for $159-179. Pro subscription - ~$100/year for 1,200 minutes per month. Unlimited - ~$240/year. Or $30/month if you pay monthly.

I started doing the math. Realized I could run it locally.

What I built

TranscribeFlow - an open-source transcriber for Apple Silicon M1-M4.

Stack:

  • MLX Whisper - transcription directly on the GPU. Free. Chose it over the OpenAI API because I wanted local. On M3 Pro, 10 minutes of audio takes about a minute and a half to process. M4 Pro - 50 seconds. Even on M1 with 8 GB - around three minutes, which matters a lot for NDA meetings.
  • Pyannote - speaker diarization. 9,000 stars on GitHub, MIT license. In September 2025 they released the community-1 model with improved speaker counting. Also free.
  • LLM post-processing - Gemini Flash cleans up the text, adds punctuation, suggests speaker names based on conversation context.
  • AI insights with templates - IT meeting, business call, interview, brainstorm. Each template extracts what matters: decisions, action items, risks.
  • Compare-view - transcript before and after processing side by side. You see exactly what the LLM changed.
  • Mindmap - key decisions as a map. Handy for retrospectives.

There’s a fallback to cloud engines: ElevenLabs, Deepgram, AssemblyAI. Haven’t needed it yet.

92% of the code was written autonomously by Claude Code agents. I set the direction, reviewed the output, and corrected architectural decisions.

Build vs Buy - when to build?

When I started doing the math, everyone around me said - just buy SaaS, don’t make it complicated. Market data backs this up: companies spend hundreds of thousands on custom solutions instead of buying ready-made ones.

Fair. For enterprise. For 50-person teams with budgets and deadlines.

My situation is different. My evenings cost $0. M3 Pro is already paid for. Claude Code subscription - $20/month, but I use it for a dozen other projects anyway.

“Build vs Buy” is not a binary choice. Transcription is free locally. AI processing - 6 cents per hour of audio. And I built a usable interface in 2.5 evenings. That’s the whole math.

The first evening I burned on Pyannote. Two hours on the timer - tracking down why it wouldn’t start on MPS (Metal Performance Shaders). Turned out you need a specific version of PyTorch. Pain. But I figured it out.

Economics

I calculate based on 15 hours of meetings per month - my actual volume.

TranscribeFlowSaaS subscription
Transcription$0 (local)Included
Speaker diarization$0 (Hugging Face)Included
LLM processing + insights~$0.90 (Gemini Flash)Included
Total/month~$0.90$8-30
Total/year~$11$100-360

PLAUD separately: device $159-179 + subscription $100-240/year. Over two years - $400-660. My transcriber over two years - $22.

Not sure yet that TranscribeFlow covers 100% of cases. Live meetings with bad microphones - haven’t tested. Zoom recordings - works great. Voice memos - also fine.

What I learned

“Build vs Buy” in 2026 is a different question than five years ago. Back then “build” meant months of development and a team of engineers. Now Claude Code agents write 92% of the code, and I handle architecture and review. Showed what the process looks like at a vibe-coding workshop.

Market price of transcription - from $0.10 to $4.00 per minute. Locally on Apple Silicon - $0.001 per minute (LLM processing only). That’s a 100-4,000x difference.

Chose open-source for control. And for the savings, not going to lie. Meeting audio doesn’t go to the cloud. Insight templates are mine. Want to add a new format - takes 10 minutes.

The downside: no mobile app, no cross-device sync, no polished onboarding. If you need polish - go SaaS. If you want control and zero variable costs - build it yourself.

TranscribeFlow GitHub (opensource)

Sources

  1. AI Transcription Pricing Comparison 2025
  2. Otter vs Notta vs Fireflies vs tl;dv - 2026 Comparison
  3. Apple Silicon Whisper Performance
  4. Pyannote Audio - GitHub
  5. Build vs Buy AI Tools
  6. PLAUD AI Membership Plans