Architecture

Kafka Events

Complete Kafka topic topology — planning, audio pipeline, video pipeline, assessment, and workflow events.

Apache Kafka decouples the API Gateway, Vert.x WebSocket edge, and Agno Python agents. All state transitions publish events; consumers react asynchronously without tight coupling. The audio pipeline uses 100-partition topics so up to 100 interview sessions can be active simultaneously without partition contention. Kafka is self-hosted via Strimzi on GKE with PLAINTEXT auth — no token refresh, no planned rebalances.

Planning & Approval Topics

Topic	Partitions	Published By	Consumed By	Trigger
skill.validation.requested	3	API Gateway	Agno Planner Agent (7777)	Data complete, start plan generation
interview.plan.created	3	Agno Planner Agent	API Gateway	Plan generation complete
approval.requested	3	Agno Planner Agent	API Gateway	Plan ready for recruiter review
approval.approved	3	API Gateway	Agno Planner Agent	Recruiter approves plan
approval.rejected	3	API Gateway	Agno Planner Agent	Recruiter rejects plan
interview.approved	3	API Gateway	Agno Interviewer Agent (7778)	Plan approved, create interview session
interview.info_needed	3	API Gateway	Notification Service	HITL: missing critical data
interview.modification_requested	3	API Gateway	Agno Planner Agent	Recruiter requests plan changes

Audio Pipeline Topics

The audio topics use 100 partitions with snappy compression for high throughput. Each message is keyed by sessionId so all chunks for a session land on the same partition, preserving order. 100 partitions means up to 100 interview sessions can stream audio concurrently without contention.

Topic	Partitions	Key	Format	Flow
audio.candidate.spoken	100 (snappy)	sessionId	WebM/Opus blob, base64 JSON	Browser → Vert.x → Kafka → Agno Interviewer
audio.agent.spoken	100 (snappy)	sessionId	PCM L16 24kHz, base64 JSON	Agno Interviewer → Kafka → Vert.x → Browser
audio.candidate.transcribed	10	sessionId	STT text JSON	Agno Interviewer → optional downstream consumers
agent.audio.response	10 (snappy)	sessionId	PCM L16 24kHz, base64 JSON	Legacy (superseded by audio.agent.spoken)
agent.text.response	3	sessionId	Text JSON	Agent text-only responses

audio.candidate.spoken — message schema

{
  "interviewId": "int_abc123",
  "sessionId":   "session_1708782896...",
  "tenantId":    "tenant-uuid",
  "audioData":   "base64_webm_opus_blob",
  "chunkIndex":  12,
  "sampleRate":  16000,
  "encoding":    "LINEAR16",
  "timestamp":   "2025-02-24T12:34:56.789Z"
}

audio.agent.spoken — message schema

{
  "session_id":     "session_1708782896...",
  "data":           "base64_pcm_l16_24k",
  "encoding":       "PCM_L16_24K",
  "text":           "That's a great answer. Let's move to system design...",
  "timestamp":      "2025-02-24T12:35:10.123Z"
}

Video Pipeline Topics

Video chunks are sent from the browser as combined WebM (vp8 video + opus audio — candidate mic + agent TTS) in 30-second segments. Video topics use a 1-hour retention for cost efficiency.

Topic	Partitions	Retention	Published By	Consumed By
video.candidate.stream	10	1 hour	Vert.x (from WebSocket VIDEO frames)	VideoStorageConsumerService (NestJS) → disk
video.candidate.snapshot	3	1 hour	Optional proctoring service	Proctoring analysis service

Assessment & Completion Topics

Topic	Partitions	Published By	Consumed By	Trigger
interview.completed	3	Agno Interviewer Agent	API Gateway + Agno Assessor	All plan sections covered or session ended
interview.assessment.ready	3	Agno Assessor Agent (7779)	API Gateway	Assessment generated, awaiting recruiter review
interview.assessment.completed	3	Agno Assessor Agent	API Gateway	Recruiter approved final verdict
interview.assessment.rejected	3	Agno Assessor Agent	API Gateway	Recruiter rejected assessment

Workflow State Topics

Topic	Partitions	Published By	Notes
workflow.pending	3	API Gateway	Plan ready, recruiter review gate
workflow.approved	3	API Gateway	Plan approved
workflow.rejected	3	API Gateway	Plan rejected (terminal)
workflow.scheduled	3	API Gateway	Interview scheduled
workflow.in_progress	3	API Gateway	Live interview started
workflow.completed	3	API Gateway	Interview session ended
workflow.cancelled	3	API Gateway	Cancelled from any state

Message Envelope

Planning and workflow messages share a common envelope with a topic-specific payload field.

skill.validation.requested

{
  "eventType": "skill.validation.requested",
  "tenantId":  "tenant-uuid",
  "runId":     "agno-run-uuid",
  "interviewId": "interview-uuid",
  "timestamp": "2025-02-24T10:01:00.000Z",
  "payload": {
    "skills":           ["TypeScript", "React", "Node.js"],
    "jobDescription":   "Build scalable platform services...",
    "level":            "SENIOR",
    "callbackUrl":      "https://ats.example.com/webhook"
  }
}

interview.plan.created

{
  "eventType": "interview.plan.created",
  "tenantId":  "tenant-uuid",
  "runId":     "agno-run-uuid",
  "interviewId": "interview-uuid",
  "timestamp": "2025-02-24T10:45:00.000Z",
  "payload": {
    "planId":         "plan-uuid",
    "validatedSkills": ["TypeScript", "React", "Node.js"],
    "duration":        60,
    "greetingScript":  "Welcome, Jane! Today we'll explore your TypeScript skills...",
    "inmailDraft":     "Hi Jane,

Your interview is ready: {{INTERVIEW_LINK}}
..."
  }
}

Interviewer Autoscaling

The Interviewer Agent is IO-bound — threads spend over 90% of their time waiting on Google STT, Gemini LLM, and Google TTS responses. CPU stays around 20% even at full load, which makes CPU-based autoscaling ineffective.

The system uses KEDA (Kubernetes Event-driven Autoscaling) to scale the Interviewer on Kafka consumer lag for audio.candidate.spoken. When messages queue up, new Interviewer pods are added within 30 seconds. Scale-down is intentionally slow (5-minute window) since interviews last 45–90 minutes and evicting an active pod mid-interview would drop the session.

Agent	Scaling Signal	Min Pods	Max Pods
Interviewer	Kafka consumer lag (audio.candidate.spoken)	2	20
Planner	CPU utilization (60%)	2	8
Assessor	CPU utilization (60%)	2	8

At max scale: 20 Interviewer pods × 50 sessions per pod = 1,000 concurrent interviews. The practical ceiling is 100 active audio streams simultaneously — one per Kafka partition. Google STT quota (300 concurrent streams by default) is typically the first external limit hit.

Local Development

Kafka is optional for local development — the API Gateway detects broker availability on startup. Start Kafka with Docker Compose:

bash

# Start Kafka + Zookeeper
docker-compose up -d kafka zookeeper

# List topics
docker exec kafka /opt/kafka/bin/kafka-topics.sh \
  --list --bootstrap-server <kafka-broker>

# Consume messages from audio pipeline
docker exec kafka /opt/kafka/bin/kafka-console-consumer.sh \
  --topic audio.candidate.spoken \
  --bootstrap-server <kafka-broker> \
  --from-beginning

Reset consumer group offsets (dev only)

# When the Agno interviewer agent has a backlog of old test messages
docker exec kafka /opt/kafka/bin/kafka-consumer-groups.sh \
  --bootstrap-server <kafka-broker> \
  --group agno-interviewer-processor \
  --reset-offsets --topic audio.candidate.spoken \
  --to-latest --execute

.env — NestJS Gateway

KAFKA_BROKERS=<kafka-broker>
KAFKA_CLIENT_ID=teamcast-gateway
KAFKA_GROUP_ID=api-gateway-consumer

Kafka consumer lag per group is visible at /api/v1/admin/kafka/status. High lag on audio.candidate.spoken indicates the Agno Interviewer Agent is down or overloaded. High lag on skill.validation.requested indicates the Planner Agent is down.

Was this page helpful?

Previous← Multi-tenancy

NextAgent Integration →