Candidate Flow

Live Interview

Real-time AI interview via WebSocket audio streaming — architecture and candidate experience.

The live interview is a real-time audio conversation between the candidate and an AI interviewer powered by the Agno Interview Agent. Audio streams bidirectionally through the Vert.x WebSocket edge — candidate voice is transcribed, processed, and responded to in near real-time.

Page: /interview/:sessionId

The interview page connects to the Vert.x WebSocket edge on mount, streams the candidate's microphone audio, and plays back TTS audio responses from the AI agent.

UI ElementDescription
AI avatarAnimated agent indicator (speaking / listening state)
Candidate videoLocal camera preview (candidate sees themselves)
Transcript panelReal-time conversation transcript (optional, recruiter configurable)
Audio level meterVisual indicator of microphone input level
Mute buttonToggle microphone (does not end session)
End interviewTerminate session and publish interview.completed event

WebSocket Connection

typescript
// Connect to Vert.x edge after start call
const wsUrl = sessionStorage.getItem('interview_ws_url') ?? 'ws://localhost:8080';
const sessionId = params.sessionId;
const tenantId = sessionStorage.getItem('interview_tenant_id');

const ws = new WebSocket(
  `${wsUrl}?sessionId=${sessionId}&tenantId=${tenantId}`
);

ws.onopen = () => {
  // Start streaming microphone audio
  startAudioCapture(ws);
};

ws.onmessage = (event) => {
  // Receive TTS audio blob from AI agent
  playAudioBlob(event.data);
};

Audio Format

DirectionFormatDetails
Candidate → Agent16kHz PCMRaw PCM audio chunks, ~100ms each
Agent → CandidateMP3 / OGGTTS audio blob, played via HTMLAudioElement
Video (optional)WebM chunksKafka: video.candidate.stream (30s segments)

Audio Pipeline

text
Candidate Browser
    │
    ├── MediaStream (getUserMedia)
    │     └── AudioContext → ScriptProcessorNode → 16kHz PCM
    │
    ├── WebSocket → Vert.x Edge (ws://localhost:8080)
    │     │
    │     └── Buffers chunks, forwards to Agno Interview Agent (HTTP)
    │               │
    │               ├── Whisper transcription → candidate text
    │               ├── LLM (Claude/GPT-4) → agent response text
    │               └── TTS (ElevenLabs/OpenAI TTS) → audio blob
    │
    └── WebSocket ← Vert.x Edge ← Agent audio blob
          │
          └── HTMLAudioElement.play() → candidate hears AI response

Interview Completion

When the AI agent determines the interview is complete (all plan sections covered), or the candidate clicks End Interview, the Vert.x edge publishes the interview.completed Kafka event, which triggers the Assessor Agent.

TriggerWhoState After
All plan sections coveredAgno Interview AgentCOMPLETED
Candidate ends sessionVert.x (via WebSocket close)COMPLETED
Admin cancelsAPI GatewayCANCELLED
Session timeout (2h)Vert.x (timer)COMPLETED

Reconnection

If the WebSocket disconnects (network interruption), the frontend attempts to reconnect using exponential backoff. The session remains active on the server for 10 minutes after disconnect.

typescript
function reconnect(attempt: number) {
  const delay = Math.min(1000 * Math.pow(2, attempt), 30000);
  setTimeout(() => {
    ws = new WebSocket(wsUrl);
    ws.onopen = () => { /* resume session */ };
    ws.onerror = () => reconnect(attempt + 1);
  }, delay);
}
The interview session state (plan section, progress, transcript) is maintained server-side in Redis. Reconnecting candidates resume from the exact point of disconnection without losing interview progress.
Was this page helpful?