Test agents from Retell, VAPI, Bland, LiveKit. Run autonomous simulations. Evaluate with LLM judges.
Opens Web UI with a sample healthcare agent and 8 test cases.
Add your Groq API key in Settings (free, no credit card).
Import from and push agents to your platform via API
Catch regressions before they reach production
Run voice agent simulations automatically before merging changes.
Fail the build when agents don't meet evaluation criteria.
Export results to monitor agent performance across releases.
Have a Claude Pro or Max plan? Use your existing subscription as the LLM backend — no additional API keys needed.
Uses your Claude Code CLI authentication directly.
Run tests against your plan subscription instead of per-token API billing.
Mix models per role — use Haiku for simulation, Sonnet for judging.
Want hosted testing with team features? We're building it. Join the waitlist to get early access.
Thanks! You're on the list.