AI Models

AI Models Overview

Teamcast Maya runs three purpose-built fine-tuned language models — Planner, Assessor, and Interviewer — each trained on production-aligned synthetic data to deliver expert-level hiring decisions.

All three agents are fine-tuned on Gemini 2.5 Flash (gemini-2.5-flash) using Google Vertex AI Supervised Fine-Tuning (SFT) with LoRA adapters (rank 16). Each agent runs on its own dedicated fine-tuned endpoint. The base Flash model is used as a fallback when a fine-tuned endpoint is not configured for a given agent.

The Three Agents

Maya's AI pipeline is composed of three specialized agents that handle distinct stages of the interview lifecycle. Each agent is independently fine-tuned on domain-specific data, allowing specialization without interference between tasks.

AgentRoleInputOutput
PlannerGenerates a competency-based interview plan tailored to the role, level, and candidate profileJob description, required skills, candidate resume, interview durationStructured plan: competencies, rubric sub-dimensions with 1-5 scale indicators, questions, skills coverage map
InterviewerConducts the live interview in real-time — asks questions, probes answers, adapts to responses using 12 interaction modesInterview plan, conversation history, candidate answer (via STT), mode hint, evidence requirementsNext interviewer question or follow-up using professional techniques (mirroring, labeling, calibrated questions, probing)
AssessorEvaluates the completed interview transcript against rubrics and produces a hiring recommendationFull transcript with turn indices, competency rubrics, must-have/nice-to-have criteriaStructured assessment: 5 evidence components per sub-dimension (1-5), transcript_turns, recommendation (STRONG_HIRE / HIRE / MAYBE / NO_HIRE)

Why Fine-Tuning?

Off-the-shelf large language models lack the domain-specific calibration required for consistent, fair hiring decisions. Fine-tuning on production-aligned synthetic data produces models that:

  • Follow structured output schemas (JSON) reliably — 90-100% valid JSON across all agents
  • Apply consistent scoring rubrics with 5 evidence components per sub-dimension (completeness, reasoning clarity, outcome strength, ownership, evidence confidence)
  • Use professional interview techniques (labeling, calibrated questions, mirroring, probing) across 12 interaction modes
  • Respect hiring thresholds precisely — STRONG_HIRE requires overall score >= 4.5, HIRE >= 3.5 on the 1-5 scale
  • Never leak answers or provide advice to candidates during live interviews (100% safety rate)

Model Versions

VersionBase ModelReleasedKey ChangesStatus
v1Gemini 2.0 FlashFeb 2026Initial fine-tuning on extracted production data — LoRA rank 4, 3 epochsRemoved
v2Gemini 2.0 FlashFeb 2026Production-format aligned training data, borderline calibration, LoRA rank 8, 5 epochsDeprecated
v3Gemini 2.5 FlashMar 20261-5 scale, evidence_components, transcript_turns, 12 interviewer modes, LoRA rank 16, 5 epochsLive
v1 endpoints have been removed. v2 endpoints on Gemini 2.0 Flash are deprecated and will be decommissioned. All production traffic now uses v3 endpoints on Gemini 2.5 Flash.

Inference Performance

All three agents run on dedicated Vertex AI fine-tuned endpoints in us-central1. Latencies are observed from the agent pod to the Vertex AI endpoint and back — network round-trip included.

AgentTypical Inference TimeNotes
Planner20–40s per planCalled once per interview — full plan generation including competency rubrics and sub-dimensions
Interviewer2–5s per turnReal-time constraint — called per candidate answer with full context
Assessor25–40s per assessmentCalled once post-interview — full transcript analysis with evidence components
The Interviewer LLM is the primary latency driver in the per-turn audio pipeline. Combined with Google STT (200-500ms) and Google TTS (300-600ms), total end-to-end audio round-trip is 1.4s - 3.2s per conversational turn — measured in production sessions.

Explore

Was this page helpful?