Architecture
Kafka Events
Complete Kafka topic topology — planning, audio pipeline, video pipeline, assessment, and workflow events.
Apache Kafka decouples the API Gateway, Vert.x WebSocket edge, and Agno Python agents. All state transitions publish events; consumers react asynchronously without tight coupling. The audio pipeline uses 100-partition topics so up to 100 interview sessions can be active simultaneously without partition contention. Kafka is self-hosted via Strimzi on GKE with PLAINTEXT auth — no token refresh, no planned rebalances.
Planning & Approval Topics
| Topic | Partitions | Published By | Consumed By | Trigger |
|---|---|---|---|---|
| skill.validation.requested | 3 | API Gateway | Agno Planner Agent (7777) | Data complete, start plan generation |
| interview.plan.created | 3 | Agno Planner Agent | API Gateway | Plan generation complete |
| approval.requested | 3 | Agno Planner Agent | API Gateway | Plan ready for recruiter review |
| approval.approved | 3 | API Gateway | Agno Planner Agent | Recruiter approves plan |
| approval.rejected | 3 | API Gateway | Agno Planner Agent | Recruiter rejects plan |
| interview.approved | 3 | API Gateway | Agno Interviewer Agent (7778) | Plan approved, create interview session |
| interview.info_needed | 3 | API Gateway | Notification Service | HITL: missing critical data |
| interview.modification_requested | 3 | API Gateway | Agno Planner Agent | Recruiter requests plan changes |
Audio Pipeline Topics
The audio topics use 100 partitions with snappy compression for high throughput. Each message is keyed by sessionId so all chunks for a session land on the same partition, preserving order. 100 partitions means up to 100 interview sessions can stream audio concurrently without contention.
| Topic | Partitions | Key | Format | Flow |
|---|---|---|---|---|
| audio.candidate.spoken | 100 (snappy) | sessionId | WebM/Opus blob, base64 JSON | Browser → Vert.x → Kafka → Agno Interviewer |
| audio.agent.spoken | 100 (snappy) | sessionId | PCM L16 24kHz, base64 JSON | Agno Interviewer → Kafka → Vert.x → Browser |
| audio.candidate.transcribed | 10 | sessionId | STT text JSON | Agno Interviewer → optional downstream consumers |
| agent.audio.response | 10 (snappy) | sessionId | PCM L16 24kHz, base64 JSON | Legacy (superseded by audio.agent.spoken) |
| agent.text.response | 3 | sessionId | Text JSON | Agent text-only responses |
{
"interviewId": "int_abc123",
"sessionId": "session_1708782896...",
"tenantId": "tenant-uuid",
"audioData": "base64_webm_opus_blob",
"chunkIndex": 12,
"sampleRate": 16000,
"encoding": "LINEAR16",
"timestamp": "2025-02-24T12:34:56.789Z"
}{
"session_id": "session_1708782896...",
"data": "base64_pcm_l16_24k",
"encoding": "PCM_L16_24K",
"text": "That's a great answer. Let's move to system design...",
"timestamp": "2025-02-24T12:35:10.123Z"
}Video Pipeline Topics
Video chunks are sent from the browser as combined WebM (vp8 video + opus audio — candidate mic + agent TTS) in 30-second segments. Video topics use a 1-hour retention for cost efficiency.
| Topic | Partitions | Retention | Published By | Consumed By |
|---|---|---|---|---|
| video.candidate.stream | 10 | 1 hour | Vert.x (from WebSocket VIDEO frames) | VideoStorageConsumerService (NestJS) → disk |
| video.candidate.snapshot | 3 | 1 hour | Optional proctoring service | Proctoring analysis service |
Assessment & Completion Topics
| Topic | Partitions | Published By | Consumed By | Trigger |
|---|---|---|---|---|
| interview.completed | 3 | Agno Interviewer Agent | API Gateway + Agno Assessor | All plan sections covered or session ended |
| interview.assessment.ready | 3 | Agno Assessor Agent (7779) | API Gateway | Assessment generated, awaiting recruiter review |
| interview.assessment.completed | 3 | Agno Assessor Agent | API Gateway | Recruiter approved final verdict |
| interview.assessment.rejected | 3 | Agno Assessor Agent | API Gateway | Recruiter rejected assessment |
Workflow State Topics
| Topic | Partitions | Published By | Notes |
|---|---|---|---|
| workflow.pending | 3 | API Gateway | Plan ready, recruiter review gate |
| workflow.approved | 3 | API Gateway | Plan approved |
| workflow.rejected | 3 | API Gateway | Plan rejected (terminal) |
| workflow.scheduled | 3 | API Gateway | Interview scheduled |
| workflow.in_progress | 3 | API Gateway | Live interview started |
| workflow.completed | 3 | API Gateway | Interview session ended |
| workflow.cancelled | 3 | API Gateway | Cancelled from any state |
Message Envelope
Planning and workflow messages share a common envelope with a topic-specific payload field.
{
"eventType": "skill.validation.requested",
"tenantId": "tenant-uuid",
"runId": "agno-run-uuid",
"interviewId": "interview-uuid",
"timestamp": "2025-02-24T10:01:00.000Z",
"payload": {
"skills": ["TypeScript", "React", "Node.js"],
"jobDescription": "Build scalable platform services...",
"level": "SENIOR",
"callbackUrl": "https://ats.example.com/webhook"
}
}{
"eventType": "interview.plan.created",
"tenantId": "tenant-uuid",
"runId": "agno-run-uuid",
"interviewId": "interview-uuid",
"timestamp": "2025-02-24T10:45:00.000Z",
"payload": {
"planId": "plan-uuid",
"validatedSkills": ["TypeScript", "React", "Node.js"],
"duration": 60,
"greetingScript": "Welcome, Jane! Today we'll explore your TypeScript skills...",
"inmailDraft": "Hi Jane,
Your interview is ready: {{INTERVIEW_LINK}}
..."
}
}Interviewer Autoscaling
The Interviewer Agent is IO-bound — threads spend over 90% of their time waiting on Google STT, Gemini LLM, and Google TTS responses. CPU stays around 20% even at full load, which makes CPU-based autoscaling ineffective.
The system uses KEDA (Kubernetes Event-driven Autoscaling) to scale the Interviewer on Kafka consumer lag for audio.candidate.spoken. When messages queue up, new Interviewer pods are added within 30 seconds. Scale-down is intentionally slow (5-minute window) since interviews last 45–90 minutes and evicting an active pod mid-interview would drop the session.
| Agent | Scaling Signal | Min Pods | Max Pods |
|---|---|---|---|
| Interviewer | Kafka consumer lag (audio.candidate.spoken) | 2 | 20 |
| Planner | CPU utilization (60%) | 2 | 8 |
| Assessor | CPU utilization (60%) | 2 | 8 |
Local Development
Kafka is optional for local development — the API Gateway detects broker availability on startup. Start Kafka with Docker Compose:
# Start Kafka + Zookeeper
docker-compose up -d kafka zookeeper
# List topics
docker exec kafka /opt/kafka/bin/kafka-topics.sh \
--list --bootstrap-server <kafka-broker>
# Consume messages from audio pipeline
docker exec kafka /opt/kafka/bin/kafka-console-consumer.sh \
--topic audio.candidate.spoken \
--bootstrap-server <kafka-broker> \
--from-beginning# When the Agno interviewer agent has a backlog of old test messages
docker exec kafka /opt/kafka/bin/kafka-consumer-groups.sh \
--bootstrap-server <kafka-broker> \
--group agno-interviewer-processor \
--reset-offsets --topic audio.candidate.spoken \
--to-latest --executeKAFKA_BROKERS=<kafka-broker>
KAFKA_CLIENT_ID=teamcast-gateway
KAFKA_GROUP_ID=api-gateway-consumer/api/v1/admin/kafka/status. High lag on audio.candidate.spoken indicates the Agno Interviewer Agent is down or overloaded. High lag on skill.validation.requested indicates the Planner Agent is down.