Research

τ-voice Examples

February 2026

τ-Voice extends τ-bench to live, full-duplex voice interactions — where both sides speak and listen at once, people interrupt, and calls happen in noisy environments. Rather than clean audio in a quiet room, τ-Voice simulates realistic conditions: accents, street noise, burst sounds, connection drops, and natural turn-taking dynamics.

A simulated τ-Voice call in the retail domain. The main timeline shows six minutes of overlapping speech, interruptions, and noise. Inset A decomposes the audio the agent receives; Inset B highlights turn-taking dynamics.

The examples below demonstrate how these conditions affect agent performance. The same task can succeed under clean audio and fail under realistic conditions — same task, same agent, different outcome. A full blog post with detailed results is coming soon.

🔊 Sample Conversations  ·  Clean vs. Realistic
Same task, different conditions
Task 14 succeeds under clean audio but fails when realistic effects are applied — same task, same agent, different outcome.
Clean
Gemini Success
Realistic
Gemini Logical
Transcription failures
Both conversations fail due to transcription errors. In clean audio, verbally encoded characters trip up the agent; in realistic audio, accent and noise compound the problem.
Clean
xAI Transcription
Realistic
xAI Transcription
Logical failures
Both conversations fail due to reasoning errors — wrong policy application or missed constraints — independent of audio quality.
Clean
OpenAI Logical
Realistic
Gemini Logical

Annotated Speech Activity Timeline

The interactive visualization below annotates the realistic Task 14 audio with speech-activity markers — user & agent speech, interruptions, noise effects, backchannels, and more. Press play to step through the conversation with a synchronized playhead.

📊 Speech Activity Timeline — Retail, Gemini
0:00 / 0:00
User
(Busy Street)
Agent
Time (seconds)
← Back to τ-bench Leaderboard