Architecture

Agent Integration

How the Agno Python agents integrate with the NestJS API Gateway, Vert.x WebSocket edge, Kafka, and Redis.

The platform runs three Agno Python agents, each with a dedicated HTTP API and Kafka integration. The Planner Agent generates interview plans from qualification statements, the Hiring Assistant conducts live sessions via audio, and the Assessor Agent produces structured verdicts with a recruiter HITL gate.

Agent Summary

Agent	K8s Service	Trigger	Output	LLM
Planner	planner-agent	Kafka: skill.validation.requested	Plan JSON → Kafka: interview.plan.created	Vertex AI Gemini 2.0 Flash (fine-tuned)
Hiring Assistant	interviewer-agent	WebSocket GREETING_REQUEST + Kafka: audio.candidate.spoken	TTS PCM L16 24kHz → WebSocket / Kafka: audio.agent.spoken	Vertex AI Gemini 2.0 Flash (fine-tuned)
Assessor	assessor-agent	REST run creation (after interview.completed)	Assessment JSON → Kafka: interview.assessment.ready	Vertex AI Gemini 2.0 Flash (fine-tuned)

Planner Agent

The Planner Agent consumes skill.validation.requested, validates qualifications via pgvector similarity search, generates a structured interview plan using Vertex AI Gemini 2.0 Flash (fine-tuned), and publishes the result to interview.plan.created. The event payload includes qualification statements when provided — Teamcast maps them internally to skills and subdimension requirements without requiring the partner to maintain a taxonomy.

text

Kafka: skill.validation.requested
    │  (position, level, qualifications[]?, skills[]?, jobDescription?, company context)
    ▼
Agno Planner Agent (Python, port 7777)
    ├── pgvector similarity search → map qualifications/skills to knowledge base
    ├── Extract qualification criteria:
    │     • must_have  ← critical requirements (e.g. "5+ years TypeScript")
    │     • nice_to_have ← preferred but not blocking (e.g. "GraphQL experience")
    ├── Build prompt: qualifications + company context + candidate profile
    ├── Vertex AI Gemini 2.0 Flash (fine-tuned) → generate structured plan JSON:
    │     • competencies + scoring rubrics (with is_required flag)
    │     • questions with follow-ups
    │     • must_have[]     ← qualification criteria array
    │     • nice_to_have[]  ← preferred criteria array
    │     • greeting_script ← Hiring Assistant's opening for this specific interview
    │     • inmail_draft    ← LinkedIn InMail with {{INTERVIEW_LINK}} placeholder
    └── Kafka: interview.plan.created → API Gateway saves plan
                   │
                   ▼
          API Gateway: state → PENDING (or APPROVED if autoApprovePlans)
          Webhook: interview.plan_generated → tenant callbackUrl

Endpoint	Method	Description
/api/v1/runs	POST	Create a new planning run
/api/v1/runs/:id	GET	Get run status and result
/api/v1/runs/:id/continue	POST	Continue run with recruiter modification feedback
/api/v1/runs/:id/cancel	DELETE	Cancel an in-progress run
/health/live	GET	Liveness probe
/health/ready	GET	Readiness probe (checks Kafka + pgvector)

Planner startup

# From apps/agno-agents/
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/key.json
export KAFKA_BROKERS=<kafka-broker>
export DATABASE_URL=postgresql://...

uvicorn agents.planner.main:app --host 0.0.0.0

Hiring Assistant

The Hiring Assistant conducts the live interview via audio. Audio chunks from the candidate arrive via Kafka, are transcribed by Google Cloud STT, processed by Vertex AI LLM, and the response is synthesized by Gemini 2.5 Flash TTS (raw PCM L16 24kHz mono, with emotion tags) and published back to Kafka for Vert.x to stream to the browser.

text

Kafka: audio.candidate.spoken (key=sessionId)
    │  WebM/Opus blob OR streaming STT chunk from browser
    ▼
Agno Hiring Assistant (Python, port 7778)
    │  Per-session FIFO worker thread (ThreadPoolExecutor — sessions never block each other)
    ├── Load session state from Redis
    │     (plan, current_question_idx, follow_up_idx, greeting_sent, messages[], pending_answers[])
    ├── Streaming STT path (streaming="chunk"/"end")
    │     └── StreamingSTTSession → accumulates partial transcript; final cached in _streaming_transcripts
    ├── Batch STT path (full audio blob)
    │     └── Reuse cached streaming transcript if ≥ 12 words; else call Google Cloud STT batch API
    │           RecognitionConfig: LINEAR16, sample_rate=48000, en-US, enhanced model, auto_punctuation
    ├── Answer debounce: FRAGMENT_DEBOUNCE (1.5s) for short; ANSWER_DEBOUNCE (0.8s) for substantial
    ├── Confusion / off-topic detection → redirect question
    ├── Compute context: remaining_minutes, questions_remaining, follow_ups_remaining, evidence_requirements
    ├── Vertex AI Gemini 2.0 Flash (fine-tuned)
    │     └── System prompt includes: time remaining, question progress, follow-up probes left,
    │           evidence to verify, NVC techniques (mirroring, labeling, calibrated questions)
    ├── Follow-up logic: exhaust plan follow_ups[] before advancing to next question
    └── Gemini 2.5 Flash TTS (Vertex AI, voice: Kore)
          └── emotion tags: [encouraging], [thoughtful], [pause], etc.
          └── Raw PCM L16 24kHz mono output
          └── Kafka: audio.agent.spoken (key=sessionId, base64 PCM in JSON)
                  │
                  ▼
          Vert.x Kafka consumer → WebSocket { type: "AGENT_RESPONSE", data: base64_pcm }
                  │
                  ▼
          Browser: AgentResponseHandler decodes Int16→Float32 + plays via AudioContext

Feature	Description
Proactive greeting	Checks greeting_sent Redis flag on first audio chunk — speaks greeting only once per session
Time-aware questioning	LLM receives remaining_minutes, questions_remaining, current_question_num — paces interview correctly and escalates urgency near the end
Follow-up probing	Per-question follow_ups list from plan is exhausted before advancing; follow_ups_remaining passed to LLM so it knows how many probes remain
Evidence requirements	observable_evidence_types from ontology forwarded to LLM system prompt — interviewer knows what to verify before moving on
Answer debounce	FRAGMENT_DEBOUNCE_SECS (1.5 s) for short answers, ANSWER_DEBOUNCE_SECS (0.8 s) for substantial answers — merges split utterances from VAD pauses
Confusion detection	Detects when candidate does not understand — rephrase or offer hint
Off-topic detection	Detects teaching/advice requests — redirects back to interview flow
Per-session queue isolation	Each session has a dedicated FIFO worker thread in ThreadPoolExecutor — concurrent sessions never block each other
Sequential queue	Audio responses queued per session — never overlapping playback

Endpoint	Method	Description
/api/v1/runs	POST	Create interview session (called by NestJS after plan approved)
/session/:id/greet	POST	Trigger greeting TTS
/health/live	GET	Liveness probe
/health/ready	GET	Readiness probe (checks Kafka + Redis + GCP credentials)

Redis Session Structure

Redis key: session:{sessionId} TTL: 7200 s

{
  "session_id": "session_1708782896...",
  "interview_id": "int_abc123",
  "tenant_id": "tenant-123",
  "current_question_idx": 2,
  "follow_up_idx": 1,
  "greeting_sent": true,
  "messages": [
    { "role": "agent",     "content": "Tell me about yourself..." },
    { "role": "candidate", "content": "I have 5 years of experience..." }
  ],
  "pending_answers": [
    "I have 5 years of TypeScript experience, mostly in React and Node.js services."
  ],
  "plan": {
    "questions": [
      {
        "text": "Tell me about a time you improved system performance.",
        "follow_ups": ["What specific metrics changed?", "How did you measure baseline?"],
        "evidence_requirements": ["Quantified outcome", "Tooling used"]
      }
    ],
    "total_duration": 60,
    "greeting_script": "Welcome! Today we'll be exploring your TypeScript and distributed systems experience..."
  },
  "started_at": "2025-02-24T12:34:56Z",
  "ended_at": null
}

Hiring Assistant startup

# From apps/agno-agents/
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/key.json  # Must be OS env var, not just dotenv
export KAFKA_BROKERS=<kafka-broker>
export REDIS_HOST=<redis-host>

PYTHONUNBUFFERED=1 uvicorn agents.interviewer.main:app --host 0.0.0.0 --log-level info

GOOGLE_APPLICATION_CREDENTIALS must be set as an OS environment variable before any Google client is instantiated. Setting it only in .env via Pydantic settings does not work — use export or set it in the shell before launching the process.

Assessor Agent

The Assessor Agent generates a structured assessment after the interview ends. It awaits recruiter verdict via a HITL gate before finalizing the report. Recruiters can also chat with the assessor via /api/v1/chat to ask questions about the assessment.

Assessment output schema

{
  "assessmentId": "assessment-uuid",
  "overallScore": 82,
  "recommendation": "STRONG_HIRE",
  "finalVerdict": "HIRE",
  "verdictBy": "recruiter-uuid",
  "summary": "The candidate demonstrated strong TypeScript and distributed systems skills...",
  "strengths": ["Strong system design thinking", "Clear communication"],
  "weaknesses": ["Limited experience with distributed tracing"],

  "mustHaveMet": 3,
  "mustHaveTotal": 3,
  "mustHaveEvaluations": [
    {
      "criterion": "5+ years TypeScript experience",
      "category": "MUST_HAVE",
      "met": true,
      "confidence": 0.95,
      "evidence": "Demonstrated advanced generics, decorators, and strict mode patterns"
    }
  ],
  "niceToHaveEvaluations": [
    {
      "criterion": "GraphQL experience",
      "category": "NICE_TO_HAVE",
      "met": false,
      "confidence": 0.8,
      "evidence": "GraphQL was not discussed during the session"
    }
  ],
  "required_competency_failures": [],

  "competencyScores": [
    {
      "competency": "TypeScript Proficiency",
      "weight": 0.35,
      "aggregated_score": 4.2,
      "is_required": true,
      "below_minimum": false
    }
  ]
}

Endpoint	Method	Description
/api/v1/runs	POST	Create assessment run (called by NestJS after interview.completed)
/api/v1/runs/:id/continue	POST	Continue with recruiter verdict data (approve / reject)
/api/v1/chat	POST	Recruiter Q&A chat about assessment results
/health/live	GET	Liveness probe
/health/ready	GET	Readiness probe

Health Check (All Agents)

bash

curl http://<planner-host>/health/ready      # Planner (default dev port: 7777)
curl http://<interviewer-host>/health/ready  # Hiring Assistant (default dev port: 7778)
curl http://<assessor-host>/health/ready     # Assessor (default dev port: 7779)

Response 200

{ "status": "ready", "kafka": "connected", "redis": "connected" }

NestJS Agent URLs

.env — NestJS Gateway

AGNO_PLANNER_URL=http://planner-agent
AGNO_INTERVIEWER_URL=http://interviewer-agent
AGNO_ASSESSOR_URL=http://assessor-agent
A2A_INTERNAL_TOKEN=<secret>  # Shared auth between NestJS and agents

All three agents are transparent to the HITL INFO_NEEDED workflow. Agents only receive skill.validation.requested after data is complete — the INFO_NEEDED state is invisible to agents.

Was this page helpful?

Previous← Kafka Events

NextAgent A2A API →