Architecture

Kafka Events

Complete Kafka topic topology — planning, audio pipeline, video pipeline, assessment, and workflow events.

Apache Kafka decouples the API Gateway, Vert.x WebSocket edge, and Agno Python agents. All state transitions publish events; consumers react asynchronously without tight coupling. The audio pipeline uses 100-partition topics so up to 100 interview sessions can be active simultaneously without partition contention. Kafka is self-hosted via Strimzi on GKE with PLAINTEXT auth — no token refresh, no planned rebalances.

Planning & Approval Topics

TopicPartitionsPublished ByConsumed ByTrigger
skill.validation.requested3API GatewayAgno Planner Agent (7777)Data complete, start plan generation
interview.plan.created3Agno Planner AgentAPI GatewayPlan generation complete
approval.requested3Agno Planner AgentAPI GatewayPlan ready for recruiter review
approval.approved3API GatewayAgno Planner AgentRecruiter approves plan
approval.rejected3API GatewayAgno Planner AgentRecruiter rejects plan
interview.approved3API GatewayAgno Interviewer Agent (7778)Plan approved, create interview session
interview.info_needed3API GatewayNotification ServiceHITL: missing critical data
interview.modification_requested3API GatewayAgno Planner AgentRecruiter requests plan changes

Audio Pipeline Topics

The audio topics use 100 partitions with snappy compression for high throughput. Each message is keyed by sessionId so all chunks for a session land on the same partition, preserving order. 100 partitions means up to 100 interview sessions can stream audio concurrently without contention.

TopicPartitionsKeyFormatFlow
audio.candidate.spoken100 (snappy)sessionIdWebM/Opus blob, base64 JSONBrowser → Vert.x → Kafka → Agno Interviewer
audio.agent.spoken100 (snappy)sessionIdPCM L16 24kHz, base64 JSONAgno Interviewer → Kafka → Vert.x → Browser
audio.candidate.transcribed10sessionIdSTT text JSONAgno Interviewer → optional downstream consumers
agent.audio.response10 (snappy)sessionIdPCM L16 24kHz, base64 JSONLegacy (superseded by audio.agent.spoken)
agent.text.response3sessionIdText JSONAgent text-only responses
audio.candidate.spoken — message schema
{
  "interviewId": "int_abc123",
  "sessionId":   "session_1708782896...",
  "tenantId":    "tenant-uuid",
  "audioData":   "base64_webm_opus_blob",
  "chunkIndex":  12,
  "sampleRate":  16000,
  "encoding":    "LINEAR16",
  "timestamp":   "2025-02-24T12:34:56.789Z"
}
audio.agent.spoken — message schema
{
  "session_id":     "session_1708782896...",
  "data":           "base64_pcm_l16_24k",
  "encoding":       "PCM_L16_24K",
  "text":           "That's a great answer. Let's move to system design...",
  "timestamp":      "2025-02-24T12:35:10.123Z"
}

Video Pipeline Topics

Video chunks are sent from the browser as combined WebM (vp8 video + opus audio — candidate mic + agent TTS) in 30-second segments. Video topics use a 1-hour retention for cost efficiency.

TopicPartitionsRetentionPublished ByConsumed By
video.candidate.stream101 hourVert.x (from WebSocket VIDEO frames)VideoStorageConsumerService (NestJS) → disk
video.candidate.snapshot31 hourOptional proctoring serviceProctoring analysis service

Assessment & Completion Topics

TopicPartitionsPublished ByConsumed ByTrigger
interview.completed3Agno Interviewer AgentAPI Gateway + Agno AssessorAll plan sections covered or session ended
interview.assessment.ready3Agno Assessor Agent (7779)API GatewayAssessment generated, awaiting recruiter review
interview.assessment.completed3Agno Assessor AgentAPI GatewayRecruiter approved final verdict
interview.assessment.rejected3Agno Assessor AgentAPI GatewayRecruiter rejected assessment

Workflow State Topics

TopicPartitionsPublished ByNotes
workflow.pending3API GatewayPlan ready, recruiter review gate
workflow.approved3API GatewayPlan approved
workflow.rejected3API GatewayPlan rejected (terminal)
workflow.scheduled3API GatewayInterview scheduled
workflow.in_progress3API GatewayLive interview started
workflow.completed3API GatewayInterview session ended
workflow.cancelled3API GatewayCancelled from any state

Message Envelope

Planning and workflow messages share a common envelope with a topic-specific payload field.

skill.validation.requested
{
  "eventType": "skill.validation.requested",
  "tenantId":  "tenant-uuid",
  "runId":     "agno-run-uuid",
  "interviewId": "interview-uuid",
  "timestamp": "2025-02-24T10:01:00.000Z",
  "payload": {
    "skills":           ["TypeScript", "React", "Node.js"],
    "jobDescription":   "Build scalable platform services...",
    "level":            "SENIOR",
    "callbackUrl":      "https://ats.example.com/webhook"
  }
}
interview.plan.created
{
  "eventType": "interview.plan.created",
  "tenantId":  "tenant-uuid",
  "runId":     "agno-run-uuid",
  "interviewId": "interview-uuid",
  "timestamp": "2025-02-24T10:45:00.000Z",
  "payload": {
    "planId":         "plan-uuid",
    "validatedSkills": ["TypeScript", "React", "Node.js"],
    "duration":        60,
    "greetingScript":  "Welcome, Jane! Today we'll explore your TypeScript skills...",
    "inmailDraft":     "Hi Jane,

Your interview is ready: {{INTERVIEW_LINK}}
..."
  }
}

Interviewer Autoscaling

The Interviewer Agent is IO-bound — threads spend over 90% of their time waiting on Google STT, Gemini LLM, and Google TTS responses. CPU stays around 20% even at full load, which makes CPU-based autoscaling ineffective.

The system uses KEDA (Kubernetes Event-driven Autoscaling) to scale the Interviewer on Kafka consumer lag for audio.candidate.spoken. When messages queue up, new Interviewer pods are added within 30 seconds. Scale-down is intentionally slow (5-minute window) since interviews last 45–90 minutes and evicting an active pod mid-interview would drop the session.

AgentScaling SignalMin PodsMax Pods
InterviewerKafka consumer lag (audio.candidate.spoken)220
PlannerCPU utilization (60%)28
AssessorCPU utilization (60%)28
At max scale: 20 Interviewer pods × 50 sessions per pod = 1,000 concurrent interviews. The practical ceiling is 100 active audio streams simultaneously — one per Kafka partition. Google STT quota (300 concurrent streams by default) is typically the first external limit hit.

Local Development

Kafka is optional for local development — the API Gateway detects broker availability on startup. Start Kafka with Docker Compose:

bash
# Start Kafka + Zookeeper
docker-compose up -d kafka zookeeper

# List topics
docker exec kafka /opt/kafka/bin/kafka-topics.sh \
  --list --bootstrap-server <kafka-broker>

# Consume messages from audio pipeline
docker exec kafka /opt/kafka/bin/kafka-console-consumer.sh \
  --topic audio.candidate.spoken \
  --bootstrap-server <kafka-broker> \
  --from-beginning
Reset consumer group offsets (dev only)
# When the Agno interviewer agent has a backlog of old test messages
docker exec kafka /opt/kafka/bin/kafka-consumer-groups.sh \
  --bootstrap-server <kafka-broker> \
  --group agno-interviewer-processor \
  --reset-offsets --topic audio.candidate.spoken \
  --to-latest --execute
.env — NestJS Gateway
KAFKA_BROKERS=<kafka-broker>
KAFKA_CLIENT_ID=teamcast-gateway
KAFKA_GROUP_ID=api-gateway-consumer
Kafka consumer lag per group is visible at /api/v1/admin/kafka/status. High lag on audio.candidate.spoken indicates the Agno Interviewer Agent is down or overloaded. High lag on skill.validation.requested indicates the Planner Agent is down.
Was this page helpful?