Open source voice agent testing

Test agents from Retell, VAPI, Bland, LiveKit. Run autonomous simulations. Evaluate with LLM judges.

Try it now

uv tool install voicetest
# serve the local web ui
voicetest serve
# or autoload a demo into it
voicetest demo --serve

Opens Web UI with a sample healthcare agent and 8 test cases.
Add your Groq API key in Settings (free, no credit card).

voicetest Web UI
Web UI — visual test management
voicetest CLI
CLI — fast iteration, CI/CD ready

How it works

Import

  • Retell CF & LLM
  • VAPI
  • Bland
  • LiveKit
  • XLSForm
  • Custom Python

Simulate

  • Autonomous multi-turn
  • Configurable LLMs
  • Real-time streaming
  • BYOK (bring your own keys)

Evaluate

  • LLM judges with reasoning
  • Pass/fail with scores
  • Global compliance metrics
  • Export results

Platform Integration

Import from and push agents to your platform via API

Retell VAPI Bland LiveKit

Multiple Interfaces

CLI
Fast iteration, CI/CD
Web UI
Visual test management
REST API
Integrate anywhere
DuckDB
Query your results

CI/CD Integration

Catch regressions before they reach production

# .github/workflows/voice-tests.yml name: Voice Agent Tests on: [push] jobs: test: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: astral-sh/setup-uv@v5 - run: uv tool install voicetest - run: voicetest run --agent agents/agent.json --tests agents/tests.json --all env: GROQ_API_KEY: ${{ secrets.GROQ_API_KEY }}

Test on every PR

Run voice agent simulations automatically before merging changes.

Block bad deploys

Fail the build when agents don't meet evaluation criteria.

Track quality over time

Export results to monitor agent performance across releases.

Claude Code Passthrough

Have a Claude Pro or Max plan? Use your existing subscription as the LLM backend — no additional API keys needed.

# .voicetest/settings.toml [models] agent = "claudecode/sonnet" simulator = "claudecode/haiku" judge = "claudecode/sonnet"

No API keys to manage

Uses your Claude Code CLI authentication directly.

Flat-rate usage

Run tests against your plan subscription instead of per-token API billing.

Sonnet, Opus, Haiku

Mix models per role — use Haiku for simulation, Sonnet for judging.

Open Source

Apache 2.0 licensed. Contributions welcome.

voicetest Cloud

Want hosted testing with team features? We're building it. Join the waitlist to get early access.

Thanks! You're on the list.