AI Models
AI Models Overview
Teamcast Maya runs three purpose-built fine-tuned language models — Planner, Assessor, and Interviewer — each trained on production-aligned synthetic data to deliver expert-level hiring decisions.
gemini-2.5-flash) using Google Vertex AI Supervised Fine-Tuning (SFT) with LoRA adapters (rank 16). Each agent runs on its own dedicated fine-tuned endpoint. The base Flash model is used as a fallback when a fine-tuned endpoint is not configured for a given agent.The Three Agents
Maya's AI pipeline is composed of three specialized agents that handle distinct stages of the interview lifecycle. Each agent is independently fine-tuned on domain-specific data, allowing specialization without interference between tasks.
| Agent | Role | Input | Output |
|---|---|---|---|
| Planner | Generates a competency-based interview plan tailored to the role, level, and candidate profile | Job description, required skills, candidate resume, interview duration | Structured plan: competencies, rubric sub-dimensions with 1-5 scale indicators, questions, skills coverage map |
| Interviewer | Conducts the live interview in real-time — asks questions, probes answers, adapts to responses using 12 interaction modes | Interview plan, conversation history, candidate answer (via STT), mode hint, evidence requirements | Next interviewer question or follow-up using professional techniques (mirroring, labeling, calibrated questions, probing) |
| Assessor | Evaluates the completed interview transcript against rubrics and produces a hiring recommendation | Full transcript with turn indices, competency rubrics, must-have/nice-to-have criteria | Structured assessment: 5 evidence components per sub-dimension (1-5), transcript_turns, recommendation (STRONG_HIRE / HIRE / MAYBE / NO_HIRE) |
Why Fine-Tuning?
Off-the-shelf large language models lack the domain-specific calibration required for consistent, fair hiring decisions. Fine-tuning on production-aligned synthetic data produces models that:
- Follow structured output schemas (JSON) reliably — 90-100% valid JSON across all agents
- Apply consistent scoring rubrics with 5 evidence components per sub-dimension (completeness, reasoning clarity, outcome strength, ownership, evidence confidence)
- Use professional interview techniques (labeling, calibrated questions, mirroring, probing) across 12 interaction modes
- Respect hiring thresholds precisely — STRONG_HIRE requires overall score >= 4.5, HIRE >= 3.5 on the 1-5 scale
- Never leak answers or provide advice to candidates during live interviews (100% safety rate)
Model Versions
| Version | Base Model | Released | Key Changes | Status |
|---|---|---|---|---|
| v1 | Gemini 2.0 Flash | Feb 2026 | Initial fine-tuning on extracted production data — LoRA rank 4, 3 epochs | Removed |
| v2 | Gemini 2.0 Flash | Feb 2026 | Production-format aligned training data, borderline calibration, LoRA rank 8, 5 epochs | Deprecated |
| v3 | Gemini 2.5 Flash | Mar 2026 | 1-5 scale, evidence_components, transcript_turns, 12 interviewer modes, LoRA rank 16, 5 epochs | Live |
Inference Performance
All three agents run on dedicated Vertex AI fine-tuned endpoints in us-central1. Latencies are observed from the agent pod to the Vertex AI endpoint and back — network round-trip included.
| Agent | Typical Inference Time | Notes |
|---|---|---|
| Planner | 20–40s per plan | Called once per interview — full plan generation including competency rubrics and sub-dimensions |
| Interviewer | 2–5s per turn | Real-time constraint — called per candidate answer with full context |
| Assessor | 25–40s per assessment | Called once post-interview — full transcript analysis with evidence components |
Explore
Fine-Tuning Methodology
Training pipeline, data generation, hyperparameters, and the Vertex AI SFT workflow.
Model Specifications
Per-agent endpoint details, system prompt design, input/output schemas, and inference config.
Evaluation Results
v3 benchmark metrics comparing fine-tuned vs base Gemini 2.5 Flash.