Architecture
Agent Integration
How the Agno Python agents integrate with the NestJS API Gateway, Vert.x WebSocket edge, Kafka, and Redis.
The platform runs three Agno Python agents, each with a dedicated HTTP API and Kafka integration. The Planner Agent generates interview plans from qualification statements, the Hiring Assistant conducts live sessions via audio, and the Assessor Agent produces structured verdicts with a recruiter HITL gate.
Agent Summary
| Agent | K8s Service | Trigger | Output | LLM |
|---|---|---|---|---|
| Planner | planner-agent | Kafka: skill.validation.requested | Plan JSON → Kafka: interview.plan.created | Vertex AI Gemini 2.0 Flash (fine-tuned) |
| Hiring Assistant | interviewer-agent | WebSocket GREETING_REQUEST + Kafka: audio.candidate.spoken | TTS PCM L16 24kHz → WebSocket / Kafka: audio.agent.spoken | Vertex AI Gemini 2.0 Flash (fine-tuned) |
| Assessor | assessor-agent | REST run creation (after interview.completed) | Assessment JSON → Kafka: interview.assessment.ready | Vertex AI Gemini 2.0 Flash (fine-tuned) |
Planner Agent
The Planner Agent consumes skill.validation.requested, validates qualifications via pgvector similarity search, generates a structured interview plan using Vertex AI Gemini 2.0 Flash (fine-tuned), and publishes the result to interview.plan.created. The event payload includes qualification statements when provided — Teamcast maps them internally to skills and subdimension requirements without requiring the partner to maintain a taxonomy.
Kafka: skill.validation.requested
│ (position, level, qualifications[]?, skills[]?, jobDescription?, company context)
▼
Agno Planner Agent (Python, port 7777)
├── pgvector similarity search → map qualifications/skills to knowledge base
├── Extract qualification criteria:
│ • must_have ← critical requirements (e.g. "5+ years TypeScript")
│ • nice_to_have ← preferred but not blocking (e.g. "GraphQL experience")
├── Build prompt: qualifications + company context + candidate profile
├── Vertex AI Gemini 2.0 Flash (fine-tuned) → generate structured plan JSON:
│ • competencies + scoring rubrics (with is_required flag)
│ • questions with follow-ups
│ • must_have[] ← qualification criteria array
│ • nice_to_have[] ← preferred criteria array
│ • greeting_script ← Hiring Assistant's opening for this specific interview
│ • inmail_draft ← LinkedIn InMail with {{INTERVIEW_LINK}} placeholder
└── Kafka: interview.plan.created → API Gateway saves plan
│
▼
API Gateway: state → PENDING (or APPROVED if autoApprovePlans)
Webhook: interview.plan_generated → tenant callbackUrl| Endpoint | Method | Description |
|---|---|---|
| /api/v1/runs | POST | Create a new planning run |
| /api/v1/runs/:id | GET | Get run status and result |
| /api/v1/runs/:id/continue | POST | Continue run with recruiter modification feedback |
| /api/v1/runs/:id/cancel | DELETE | Cancel an in-progress run |
| /health/live | GET | Liveness probe |
| /health/ready | GET | Readiness probe (checks Kafka + pgvector) |
# From apps/agno-agents/
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/key.json
export KAFKA_BROKERS=<kafka-broker>
export DATABASE_URL=postgresql://...
uvicorn agents.planner.main:app --host 0.0.0.0Hiring Assistant
The Hiring Assistant conducts the live interview via audio. Audio chunks from the candidate arrive via Kafka, are transcribed by Google Cloud STT, processed by Vertex AI LLM, and the response is synthesized by Gemini 2.5 Flash TTS (raw PCM L16 24kHz mono, with emotion tags) and published back to Kafka for Vert.x to stream to the browser.
Kafka: audio.candidate.spoken (key=sessionId)
│ WebM/Opus blob OR streaming STT chunk from browser
▼
Agno Hiring Assistant (Python, port 7778)
│ Per-session FIFO worker thread (ThreadPoolExecutor — sessions never block each other)
├── Load session state from Redis
│ (plan, current_question_idx, follow_up_idx, greeting_sent, messages[], pending_answers[])
├── Streaming STT path (streaming="chunk"/"end")
│ └── StreamingSTTSession → accumulates partial transcript; final cached in _streaming_transcripts
├── Batch STT path (full audio blob)
│ └── Reuse cached streaming transcript if ≥ 12 words; else call Google Cloud STT batch API
│ RecognitionConfig: LINEAR16, sample_rate=48000, en-US, enhanced model, auto_punctuation
├── Answer debounce: FRAGMENT_DEBOUNCE (1.5s) for short; ANSWER_DEBOUNCE (0.8s) for substantial
├── Confusion / off-topic detection → redirect question
├── Compute context: remaining_minutes, questions_remaining, follow_ups_remaining, evidence_requirements
├── Vertex AI Gemini 2.0 Flash (fine-tuned)
│ └── System prompt includes: time remaining, question progress, follow-up probes left,
│ evidence to verify, NVC techniques (mirroring, labeling, calibrated questions)
├── Follow-up logic: exhaust plan follow_ups[] before advancing to next question
└── Gemini 2.5 Flash TTS (Vertex AI, voice: Kore)
└── emotion tags: [encouraging], [thoughtful], [pause], etc.
└── Raw PCM L16 24kHz mono output
└── Kafka: audio.agent.spoken (key=sessionId, base64 PCM in JSON)
│
▼
Vert.x Kafka consumer → WebSocket { type: "AGENT_RESPONSE", data: base64_pcm }
│
▼
Browser: AgentResponseHandler decodes Int16→Float32 + plays via AudioContext| Feature | Description |
|---|---|
| Proactive greeting | Checks greeting_sent Redis flag on first audio chunk — speaks greeting only once per session |
| Time-aware questioning | LLM receives remaining_minutes, questions_remaining, current_question_num — paces interview correctly and escalates urgency near the end |
| Follow-up probing | Per-question follow_ups list from plan is exhausted before advancing; follow_ups_remaining passed to LLM so it knows how many probes remain |
| Evidence requirements | observable_evidence_types from ontology forwarded to LLM system prompt — interviewer knows what to verify before moving on |
| Answer debounce | FRAGMENT_DEBOUNCE_SECS (1.5 s) for short answers, ANSWER_DEBOUNCE_SECS (0.8 s) for substantial answers — merges split utterances from VAD pauses |
| Confusion detection | Detects when candidate does not understand — rephrase or offer hint |
| Off-topic detection | Detects teaching/advice requests — redirects back to interview flow |
| Per-session queue isolation | Each session has a dedicated FIFO worker thread in ThreadPoolExecutor — concurrent sessions never block each other |
| Sequential queue | Audio responses queued per session — never overlapping playback |
| Endpoint | Method | Description |
|---|---|---|
| /api/v1/runs | POST | Create interview session (called by NestJS after plan approved) |
| /session/:id/greet | POST | Trigger greeting TTS |
| /health/live | GET | Liveness probe |
| /health/ready | GET | Readiness probe (checks Kafka + Redis + GCP credentials) |
Redis Session Structure
{
"session_id": "session_1708782896...",
"interview_id": "int_abc123",
"tenant_id": "tenant-123",
"current_question_idx": 2,
"follow_up_idx": 1,
"greeting_sent": true,
"messages": [
{ "role": "agent", "content": "Tell me about yourself..." },
{ "role": "candidate", "content": "I have 5 years of experience..." }
],
"pending_answers": [
"I have 5 years of TypeScript experience, mostly in React and Node.js services."
],
"plan": {
"questions": [
{
"text": "Tell me about a time you improved system performance.",
"follow_ups": ["What specific metrics changed?", "How did you measure baseline?"],
"evidence_requirements": ["Quantified outcome", "Tooling used"]
}
],
"total_duration": 60,
"greeting_script": "Welcome! Today we'll be exploring your TypeScript and distributed systems experience..."
},
"started_at": "2025-02-24T12:34:56Z",
"ended_at": null
}# From apps/agno-agents/
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/key.json # Must be OS env var, not just dotenv
export KAFKA_BROKERS=<kafka-broker>
export REDIS_HOST=<redis-host>
PYTHONUNBUFFERED=1 uvicorn agents.interviewer.main:app --host 0.0.0.0 --log-level infoGOOGLE_APPLICATION_CREDENTIALS must be set as an OS environment variable before any Google client is instantiated. Setting it only in .env via Pydantic settings does not work — use export or set it in the shell before launching the process.Assessor Agent
The Assessor Agent generates a structured assessment after the interview ends. It awaits recruiter verdict via a HITL gate before finalizing the report. Recruiters can also chat with the assessor via /api/v1/chat to ask questions about the assessment.
{
"assessmentId": "assessment-uuid",
"overallScore": 82,
"recommendation": "STRONG_HIRE",
"finalVerdict": "HIRE",
"verdictBy": "recruiter-uuid",
"summary": "The candidate demonstrated strong TypeScript and distributed systems skills...",
"strengths": ["Strong system design thinking", "Clear communication"],
"weaknesses": ["Limited experience with distributed tracing"],
"mustHaveMet": 3,
"mustHaveTotal": 3,
"mustHaveEvaluations": [
{
"criterion": "5+ years TypeScript experience",
"category": "MUST_HAVE",
"met": true,
"confidence": 0.95,
"evidence": "Demonstrated advanced generics, decorators, and strict mode patterns"
}
],
"niceToHaveEvaluations": [
{
"criterion": "GraphQL experience",
"category": "NICE_TO_HAVE",
"met": false,
"confidence": 0.8,
"evidence": "GraphQL was not discussed during the session"
}
],
"required_competency_failures": [],
"competencyScores": [
{
"competency": "TypeScript Proficiency",
"weight": 0.35,
"aggregated_score": 4.2,
"is_required": true,
"below_minimum": false
}
]
}| Endpoint | Method | Description |
|---|---|---|
| /api/v1/runs | POST | Create assessment run (called by NestJS after interview.completed) |
| /api/v1/runs/:id/continue | POST | Continue with recruiter verdict data (approve / reject) |
| /api/v1/chat | POST | Recruiter Q&A chat about assessment results |
| /health/live | GET | Liveness probe |
| /health/ready | GET | Readiness probe |
Health Check (All Agents)
curl http://<planner-host>/health/ready # Planner (default dev port: 7777)
curl http://<interviewer-host>/health/ready # Hiring Assistant (default dev port: 7778)
curl http://<assessor-host>/health/ready # Assessor (default dev port: 7779){ "status": "ready", "kafka": "connected", "redis": "connected" }NestJS Agent URLs
AGNO_PLANNER_URL=http://planner-agent
AGNO_INTERVIEWER_URL=http://interviewer-agent
AGNO_ASSESSOR_URL=http://assessor-agent
A2A_INTERNAL_TOKEN=<secret> # Shared auth between NestJS and agentsskill.validation.requested after data is complete — the INFO_NEEDED state is invisible to agents.