
$0.44 for a 10-minute product demo. With voice, screen sharing, and live Q&A. Not a recording. A live agent.
Naoma.ai shipped exactly that. 6-person startup (ex-PandaDoc), pre-seed $440K. Their agent opens a browser, clicks through the UI, walks through features, and talks to the prospect by voice. $5-10 per demo instead of $50-100 for an hour of a salesperson’s time.
First reaction - sales tool, okay. Brainstormed with AI and found at least 4 directions where this kind of agent solves real problems.
Where it works beyond sales
Product management - the most interesting angle for me:
- Demos of new features for stakeholders. Not a slide deck with screenshots - a live walkthrough of the UI. Stakeholder asks questions, agent answers and shows
- Feedback collection. Agent shows a prototype, asks questions, records answers. No more coordinating schedules a week out for a “quick review”
- Product onboarding for new team members. Instead of “ask Masha, she’ll show you” - agent available 24/7
Customer Success:
- Training clients on new features. Instead of recording a video (which goes stale after the next release) - an interactive agent that always shows the current UI
- Troubleshooting. “Show me where the issue is” - agent walks through the steps and explains
HR / internal ops:
- Demos of internal tools for new hires. CRM, Jira, internal systems. Instead of a 40-page wiki - a live walkthrough
Marketing:
- Interactive demo right on the landing page. Not a video, not screenshots - the visitor asks questions and sees the real product. Personalized to their role
What’s inside - technically
Went through GitHub and docs - the stack turned out simpler than I expected:
Browser control: Browser-Use (Python, open-source, 78K+ GitHub stars) or Playwright. Browser-Use figures out where to click based on DOM and screenshots. Playwright - if you want to script the scenarios manually.
Speech recognition (STT): Deepgram Nova-2 ($0.0043/min - cheapest), OpenAI Whisper API ($0.006/min), Google Speech-to-Text ($0.006/min), AssemblyAI ($0.012/min). Deepgram has the lowest latency (~100ms), but worth testing all options depending on your language needs.
Voice synthesis (TTS): OpenAI TTS ($0.015/1K chars - solid price/quality), ElevenLabs ($0.08-0.18/1K chars - best quality on the market), Google Cloud TTS ($0.016/1K chars). ElevenLabs sounds more natural but is 5-10x more expensive.
Full voice pipeline: You can wire STT + LLM + TTS separately, or grab something ready-made. OpenAI Realtime API - speech-to-speech with no intermediate steps, latency ~200-400ms. Vapi.ai - orchestrator platform that connects STT/LLM/TTS for you ($0.05/min + provider costs). LiveKit Agents - open-source voice agent framework with WebRTC out of the box.
Screen streaming: LiveKit (open-source, has a cloud), Daily.co, Twilio Video. LiveKit is the best option if you need both voice and video in one solution.
LLM orchestrator (the brain): GPT-4o-mini - cheap ($0.15/1M input tokens), fast, enough for routine navigation, or Gemini 2.5 Flash - needs testing. Ideally you want routing: simple actions on a cheap model, complex questions on a smart one.
Cost for a 10-minute demo:
| Option | Cost |
|---|---|
| Budget (GPT-4o-mini + Deepgram + OpenAI TTS) | $0.44 |
| Standard (OpenAI Realtime API) | $1.97 |
| Premium (ElevenLabs + avatar) | $2.82 |
$0.44. Even if the agent runs 100 demos a day - that’s $44. One salesperson costs more per hour.
The market
Static demo platforms (Walnut, Navattic, Storylane) - a mature market, analysts estimate around $500M. All of them show recorded scenarios. A live AI agent with voice is a niche nobody has really claimed yet.
Meanwhile, AI SDR agents have pulled in $100M+ over the past year. PLG (product-led growth) is also pushing in this direction: let users try the product without a sales call. As a PM, that’s the most interesting trend - less friction in the funnel.
How to build it yourself
Minimal path:
- Browser-Use for UI control - installs in a minute, Python
- LiveKit for WebRTC streaming - voice and screen in one SDK
- OpenAI Realtime API or Vapi.ai for the voice pipeline
- GPT-4o-mini for routine navigation + GPT-4o / Claude Sonnet for complex questions
Alternative budget stack: Deepgram (STT) + Google Cloud TTS + Gemini 2.5 Flash. Cheaper, but more integration work.
A minimal prototype for a single product - realistically doable over a weekend. A full agent handling edge cases - a couple of weeks.
Added it to my side project list. Not sure yet which voice pipeline to pick - Realtime API is simpler, ElevenLabs might sound more natural. Need to test.
What I learned
As a PM, I see the main potential not in replacing salespeople. Collecting feedback through an interactive agent that shows a prototype and asks questions - that’s what hooked me. Agent runs the demo and records the feedback.
Trade-off: the agent doesn’t understand context at a human level yet. If a stakeholder or client asks “how does this fit into our Q3 strategy?” - the agent will struggle (though you could account for this with additional context about the client). But for “show me the new filtering feature and how it works” - it should be more than enough.