Core Concepts¶
Agent graphs¶
An agent is represented as an AgentGraph: a directed graph of nodes connected by transitions. Each node has a prompt, a type, and outgoing edges that control conversation flow. The graph has a single entry_node_id where every conversation starts.
Voicetest's AgentGraph is the unified internal representation that every importer converts to and every exporter renders from. See Features: Format conversion for the round-trip pipeline.
Node types¶
| Type | LLM call | Speaks? | Routing |
|---|---|---|---|
| Conversation | Yes | Yes | LLM picks a transition via prompt match, or falls back to an always edge |
| Logic | No | No | Evaluates equations top-to-bottom; first match wins |
| Extract | Yes (extraction) | No | LLM extracts variables from the conversation, then equations route |
| End | Optional | If prompted | Terminates the call. With a state_prompt, agent speaks one final turn before ending |
| Transfer | Optional | If prompted | Same as End structurally, but the call status reflects a transfer rather than a hangup |
Any node type can also be a global node — reachable from any conversation node without explicit edges. See Global Nodes below.
Conversation nodes are the standard building block — they generate a spoken response and use LLM judgment (or an always edge) to choose the next node.
Logic nodes (also called branch nodes) have no prompt and produce no speech. All their transitions use equation or always conditions, evaluated deterministically without an LLM call.
Extract nodes combine LLM extraction with deterministic routing. They define variables_to_extract (each with a name, description, type, and optional choices). The engine calls the LLM once to extract all variables from the conversation history, stores them as dynamic variables, then evaluates equation transitions using the extracted values.
End nodes terminate the call cleanly. If state_prompt is empty, the call ends immediately; if non-empty, the agent generates one final response (e.g. "Thanks for calling, goodbye.") before the engine sets end_call_invoked.
Transfer nodes are structurally identical to End but mark the disconnect as a transfer rather than a hangup — useful for handoffs to a human agent or another phone number.
For a visual walkthrough with all five node types in one diagram, see Five Node Types and Global Interrupts.
Global nodes¶
Global nodes are reachable from any conversation node in the flow without requiring explicit edges from every source. Useful for "transfer to manager," "I want to start over," and similar interrupts that should fire from anywhere.
Each global node has a global_node_setting containing:
condition— An LLM prompt that triggers entry (e.g., "Caller wants to cancel")go_back_conditions— LLM-prompted conditions that return to the originating node
The engine appends global node conditions to every conversation node's transition options. The LLM sees both local transitions and global entry conditions, and picks the best match. When a global node is entered, the engine pushes the originating node onto a stack. Go-back conditions target the originator; on go-back the stack is popped and the conversation resumes at the originating node with full transcript context.
Stacking: Global nodes can trigger other global nodes. Each go-back pops one level.
Zero global nodes: When a flow has no global nodes, behavior is identical to before. The format_transitions signature is backward-compatible.
Dynamic variables¶
Prompts can reference dynamic variables using {{variable_name}} syntax. Variables come from two sources:
- Test case
dynamic_variables— set before the conversation starts (e.g.,{{caller_name}},{{account_id}}) - Extract node output — populated during the conversation when an extract node fires
Expansion order: snippet references {%name%} are resolved first, then {{variable}} placeholders are substituted into the result. Unknown variables are left as-is.
Equations¶
Equation conditions on transitions support these operators:
| Operator | Example | Notes |
|---|---|---|
== |
status == "active" |
String equality |
!= |
tier != "free" |
String inequality |
> >= < <= |
age >= 18 |
Numeric coercion; non-numeric values return false |
contains |
notes contains "urgent" |
Substring match |
not_contains |
reply not_contains "err" |
Substring absence |
exists |
email exists |
Variable is set |
not_exist |
phone not_exist |
Variable is absent |
Multiple clauses combine with logical_operator: "and" (default, all must match) or "or" (any must match).
Test cases¶
Test cases define simulated conversations to run against an agent. Two types are supported.
LLM tests (type: "llm") use a judge LLM to evaluate semantic metrics against the transcript:
{
"name": "Customer billing inquiry",
"user_prompt": "## Identity\nYour name is Jane.\n\n## Goal\nGet help with a charge on your bill.",
"metrics": ["Agent greeted the customer and addressed the billing concern"],
"dynamic_variables": {"caller_name": "Jane", "account_id": "12345"},
"type": "llm"
}
Rule tests (type: "rule") use deterministic pattern matching — no LLM involved in judging:
{
"name": "No PII leakage",
"user_prompt": "You mention your full SSN 123-45-6789 mid-conversation.",
"excludes": ["123-45-6789", "123456789"],
"patterns": ["REF-[A-Z0-9]+"],
"type": "rule"
}
| Field | Applies to | Description |
|---|---|---|
name |
both | Display name, also used to select tests via --test |
user_prompt |
both | Persona and goal description for the simulated user |
dynamic_variables |
both | Key-value pairs injected into {{var}} placeholders before the conversation |
tool_mocks |
both | Stub tool responses for tools the agent calls during the conversation |
llm_model |
both | Per-test agent model override (only honored when test_model_precedence is on) |
metrics |
LLM | List of natural-language criteria the judge scores (0–1) against the transcript |
includes |
rule | Substrings that must appear in the transcript |
excludes |
rule | Substrings that must not appear in the transcript |
patterns |
rule | Regex patterns that must match somewhere in the transcript |
Legacy values "simulation" and "unit" are accepted and mapped to "llm" and "rule" respectively, but new test cases should use the canonical names.
Runs and results¶
A Run is a recorded execution of one or more test cases against an agent at a specific point in time. Each Run contains a list of Result records, one per test case (or one per conversation, when imported from a transcript dump).
| Run kind | How it's created | Result status values |
|---|---|---|
| Simulated | voicetest run --all or "Run" in the Web UI |
pass, fail, error |
| Imported | voicetest import-call --transcript ... |
imported |
| Replay | voicetest replay <run-id> |
pass (passive capture) |
Each Result captures:
transcript— list of user/assistant/tool messagesmetric_results— score and reasoning per LLM metricaudio_metric_results— same shape, evaluated against the TTS/STT round-tripped transcriptnodes_visitedandtools_called— the path through the graph and any tool invocationsturn_count,duration_ms,end_reason— call metadataerror_message— populated whenstatus="error"
Runs persist to DuckDB at .voicetest/data.duckdb (configurable via VOICETEST_DB_PATH). Both simulated and imported runs render side by side in the runs UI, can be exported as JSON, and can be replayed against the agent's current graph.
Snippets¶
Named, reusable text blocks defined at the agent level and referenced in prompts via {%snippet_name%}. Useful for compliance disclaimers, sign-off phrases, or any text repeated across multiple node prompts.
See Features: Prompt snippets for the snippet system and the DRY-analysis tooling that finds candidates for extraction.
See it in action¶
- Recipe: Regression-test prompt changes — uses the run/result model to compare two snapshots of an agent.
- Recipe: Import call history — turns production calls into Runs you can replay.
- Recipe: Diagnose a failing test — walks the graph to find which node owns a failing metric.