Architecture

Agent Integration

How the Agno Python agents integrate with the NestJS API Gateway, Vert.x WebSocket edge, Kafka, and Redis.

The platform runs three Agno Python agents, each with a dedicated HTTP API and Kafka integration. The Planner Agent generates interview plans from qualification statements, the Hiring Assistant conducts live sessions via audio, and the Assessor Agent produces structured verdicts with a recruiter HITL gate.

Agent Summary

AgentK8s ServiceTriggerOutputLLM
Plannerplanner-agentKafka: skill.validation.requestedPlan JSON → Kafka: interview.plan.createdVertex AI Gemini 2.0 Flash (fine-tuned)
Hiring Assistantinterviewer-agentWebSocket GREETING_REQUEST + Kafka: audio.candidate.spokenTTS PCM L16 24kHz → WebSocket / Kafka: audio.agent.spokenVertex AI Gemini 2.0 Flash (fine-tuned)
Assessorassessor-agentREST run creation (after interview.completed)Assessment JSON → Kafka: interview.assessment.readyVertex AI Gemini 2.0 Flash (fine-tuned)

Planner Agent

The Planner Agent consumes skill.validation.requested, validates qualifications via pgvector similarity search, generates a structured interview plan using Vertex AI Gemini 2.0 Flash (fine-tuned), and publishes the result to interview.plan.created. The event payload includes qualification statements when provided — Teamcast maps them internally to skills and subdimension requirements without requiring the partner to maintain a taxonomy.

text
Kafka: skill.validation.requested
    │  (position, level, qualifications[]?, skills[]?, jobDescription?, company context)
    ▼
Agno Planner Agent (Python, port 7777)
    ├── pgvector similarity search → map qualifications/skills to knowledge base
    ├── Extract qualification criteria:
    │     • must_have  ← critical requirements (e.g. "5+ years TypeScript")
    │     • nice_to_have ← preferred but not blocking (e.g. "GraphQL experience")
    ├── Build prompt: qualifications + company context + candidate profile
    ├── Vertex AI Gemini 2.0 Flash (fine-tuned) → generate structured plan JSON:
    │     • competencies + scoring rubrics (with is_required flag)
    │     • questions with follow-ups
    │     • must_have[]     ← qualification criteria array
    │     • nice_to_have[]  ← preferred criteria array
    │     • greeting_script ← Hiring Assistant's opening for this specific interview
    │     • inmail_draft    ← LinkedIn InMail with {{INTERVIEW_LINK}} placeholder
    └── Kafka: interview.plan.created → API Gateway saves plan
                   │
                   ▼
          API Gateway: state → PENDING (or APPROVED if autoApprovePlans)
          Webhook: interview.plan_generated → tenant callbackUrl
EndpointMethodDescription
/api/v1/runsPOSTCreate a new planning run
/api/v1/runs/:idGETGet run status and result
/api/v1/runs/:id/continuePOSTContinue run with recruiter modification feedback
/api/v1/runs/:id/cancelDELETECancel an in-progress run
/health/liveGETLiveness probe
/health/readyGETReadiness probe (checks Kafka + pgvector)
Planner startup
# From apps/agno-agents/
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/key.json
export KAFKA_BROKERS=<kafka-broker>
export DATABASE_URL=postgresql://...

uvicorn agents.planner.main:app --host 0.0.0.0

Hiring Assistant

The Hiring Assistant conducts the live interview via audio. Audio chunks from the candidate arrive via Kafka, are transcribed by Google Cloud STT, processed by Vertex AI LLM, and the response is synthesized by Gemini 2.5 Flash TTS (raw PCM L16 24kHz mono, with emotion tags) and published back to Kafka for Vert.x to stream to the browser.

text
Kafka: audio.candidate.spoken (key=sessionId)
    │  WebM/Opus blob OR streaming STT chunk from browser
    ▼
Agno Hiring Assistant (Python, port 7778)
    │  Per-session FIFO worker thread (ThreadPoolExecutor — sessions never block each other)
    ├── Load session state from Redis
    │     (plan, current_question_idx, follow_up_idx, greeting_sent, messages[], pending_answers[])
    ├── Streaming STT path (streaming="chunk"/"end")
    │     └── StreamingSTTSession → accumulates partial transcript; final cached in _streaming_transcripts
    ├── Batch STT path (full audio blob)
    │     └── Reuse cached streaming transcript if ≥ 12 words; else call Google Cloud STT batch API
    │           RecognitionConfig: LINEAR16, sample_rate=48000, en-US, enhanced model, auto_punctuation
    ├── Answer debounce: FRAGMENT_DEBOUNCE (1.5s) for short; ANSWER_DEBOUNCE (0.8s) for substantial
    ├── Confusion / off-topic detection → redirect question
    ├── Compute context: remaining_minutes, questions_remaining, follow_ups_remaining, evidence_requirements
    ├── Vertex AI Gemini 2.0 Flash (fine-tuned)
    │     └── System prompt includes: time remaining, question progress, follow-up probes left,
    │           evidence to verify, NVC techniques (mirroring, labeling, calibrated questions)
    ├── Follow-up logic: exhaust plan follow_ups[] before advancing to next question
    └── Gemini 2.5 Flash TTS (Vertex AI, voice: Kore)
          └── emotion tags: [encouraging], [thoughtful], [pause], etc.
          └── Raw PCM L16 24kHz mono output
          └── Kafka: audio.agent.spoken (key=sessionId, base64 PCM in JSON)
                  │
                  ▼
          Vert.x Kafka consumer → WebSocket { type: "AGENT_RESPONSE", data: base64_pcm }
                  │
                  ▼
          Browser: AgentResponseHandler decodes Int16→Float32 + plays via AudioContext
FeatureDescription
Proactive greetingChecks greeting_sent Redis flag on first audio chunk — speaks greeting only once per session
Time-aware questioningLLM receives remaining_minutes, questions_remaining, current_question_num — paces interview correctly and escalates urgency near the end
Follow-up probingPer-question follow_ups list from plan is exhausted before advancing; follow_ups_remaining passed to LLM so it knows how many probes remain
Evidence requirementsobservable_evidence_types from ontology forwarded to LLM system prompt — interviewer knows what to verify before moving on
Answer debounceFRAGMENT_DEBOUNCE_SECS (1.5 s) for short answers, ANSWER_DEBOUNCE_SECS (0.8 s) for substantial answers — merges split utterances from VAD pauses
Confusion detectionDetects when candidate does not understand — rephrase or offer hint
Off-topic detectionDetects teaching/advice requests — redirects back to interview flow
Per-session queue isolationEach session has a dedicated FIFO worker thread in ThreadPoolExecutor — concurrent sessions never block each other
Sequential queueAudio responses queued per session — never overlapping playback
EndpointMethodDescription
/api/v1/runsPOSTCreate interview session (called by NestJS after plan approved)
/session/:id/greetPOSTTrigger greeting TTS
/health/liveGETLiveness probe
/health/readyGETReadiness probe (checks Kafka + Redis + GCP credentials)

Redis Session Structure

Redis key: session:{sessionId} TTL: 7200 s
{
  "session_id": "session_1708782896...",
  "interview_id": "int_abc123",
  "tenant_id": "tenant-123",
  "current_question_idx": 2,
  "follow_up_idx": 1,
  "greeting_sent": true,
  "messages": [
    { "role": "agent",     "content": "Tell me about yourself..." },
    { "role": "candidate", "content": "I have 5 years of experience..." }
  ],
  "pending_answers": [
    "I have 5 years of TypeScript experience, mostly in React and Node.js services."
  ],
  "plan": {
    "questions": [
      {
        "text": "Tell me about a time you improved system performance.",
        "follow_ups": ["What specific metrics changed?", "How did you measure baseline?"],
        "evidence_requirements": ["Quantified outcome", "Tooling used"]
      }
    ],
    "total_duration": 60,
    "greeting_script": "Welcome! Today we'll be exploring your TypeScript and distributed systems experience..."
  },
  "started_at": "2025-02-24T12:34:56Z",
  "ended_at": null
}
Hiring Assistant startup
# From apps/agno-agents/
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/key.json  # Must be OS env var, not just dotenv
export KAFKA_BROKERS=<kafka-broker>
export REDIS_HOST=<redis-host>

PYTHONUNBUFFERED=1 uvicorn agents.interviewer.main:app --host 0.0.0.0 --log-level info
GOOGLE_APPLICATION_CREDENTIALS must be set as an OS environment variable before any Google client is instantiated. Setting it only in .env via Pydantic settings does not work — use export or set it in the shell before launching the process.

Assessor Agent

The Assessor Agent generates a structured assessment after the interview ends. It awaits recruiter verdict via a HITL gate before finalizing the report. Recruiters can also chat with the assessor via /api/v1/chat to ask questions about the assessment.

Assessment output schema
{
  "assessmentId": "assessment-uuid",
  "overallScore": 82,
  "recommendation": "STRONG_HIRE",
  "finalVerdict": "HIRE",
  "verdictBy": "recruiter-uuid",
  "summary": "The candidate demonstrated strong TypeScript and distributed systems skills...",
  "strengths": ["Strong system design thinking", "Clear communication"],
  "weaknesses": ["Limited experience with distributed tracing"],

  "mustHaveMet": 3,
  "mustHaveTotal": 3,
  "mustHaveEvaluations": [
    {
      "criterion": "5+ years TypeScript experience",
      "category": "MUST_HAVE",
      "met": true,
      "confidence": 0.95,
      "evidence": "Demonstrated advanced generics, decorators, and strict mode patterns"
    }
  ],
  "niceToHaveEvaluations": [
    {
      "criterion": "GraphQL experience",
      "category": "NICE_TO_HAVE",
      "met": false,
      "confidence": 0.8,
      "evidence": "GraphQL was not discussed during the session"
    }
  ],
  "required_competency_failures": [],

  "competencyScores": [
    {
      "competency": "TypeScript Proficiency",
      "weight": 0.35,
      "aggregated_score": 4.2,
      "is_required": true,
      "below_minimum": false
    }
  ]
}
EndpointMethodDescription
/api/v1/runsPOSTCreate assessment run (called by NestJS after interview.completed)
/api/v1/runs/:id/continuePOSTContinue with recruiter verdict data (approve / reject)
/api/v1/chatPOSTRecruiter Q&A chat about assessment results
/health/liveGETLiveness probe
/health/readyGETReadiness probe

Health Check (All Agents)

bash
curl http://<planner-host>/health/ready      # Planner (default dev port: 7777)
curl http://<interviewer-host>/health/ready  # Hiring Assistant (default dev port: 7778)
curl http://<assessor-host>/health/ready     # Assessor (default dev port: 7779)
Response 200
{ "status": "ready", "kafka": "connected", "redis": "connected" }

NestJS Agent URLs

.env — NestJS Gateway
AGNO_PLANNER_URL=http://planner-agent
AGNO_INTERVIEWER_URL=http://interviewer-agent
AGNO_ASSESSOR_URL=http://assessor-agent
A2A_INTERNAL_TOKEN=<secret>  # Shared auth between NestJS and agents
All three agents are transparent to the HITL INFO_NEEDED workflow. Agents only receive skill.validation.requested after data is complete — the INFO_NEEDED state is invisible to agents.
Was this page helpful?