AI Models

AI Models Overview

Teamcast Maya runs three purpose-built fine-tuned language models — Planner, Assessor, and Interviewer — each trained on production-aligned synthetic data to deliver expert-level hiring decisions.

All three agents are fine-tuned on Gemini 2.5 Flash (gemini-2.5-flash) using Google Vertex AI Supervised Fine-Tuning (SFT) with LoRA adapters (rank 16). Each agent runs on its own dedicated fine-tuned endpoint. The base Flash model is used as a fallback when a fine-tuned endpoint is not configured for a given agent.

The Three Agents

Maya's AI pipeline is composed of three specialized agents that handle distinct stages of the interview lifecycle. Each agent is independently fine-tuned on domain-specific data, allowing specialization without interference between tasks.

Agent	Role	Input	Output
Planner	Generates a competency-based interview plan tailored to the role, level, and candidate profile	Job description, required skills, candidate resume, interview duration	Structured plan: competencies, rubric sub-dimensions with 1-5 scale indicators, questions, skills coverage map
Interviewer	Conducts the live interview in real-time — asks questions, probes answers, adapts to responses using 12 interaction modes	Interview plan, conversation history, candidate answer (via STT), mode hint, evidence requirements	Next interviewer question or follow-up using professional techniques (mirroring, labeling, calibrated questions, probing)
Assessor	Evaluates the completed interview transcript against rubrics and produces a hiring recommendation	Full transcript with turn indices, competency rubrics, must-have/nice-to-have criteria	Structured assessment: 5 evidence components per sub-dimension (1-5), transcript_turns, recommendation (STRONG_HIRE / HIRE / MAYBE / NO_HIRE)

Why Fine-Tuning?

Off-the-shelf large language models lack the domain-specific calibration required for consistent, fair hiring decisions. Fine-tuning on production-aligned synthetic data produces models that:

Follow structured output schemas (JSON) reliably — 90-100% valid JSON across all agents
Apply consistent scoring rubrics with 5 evidence components per sub-dimension (completeness, reasoning clarity, outcome strength, ownership, evidence confidence)
Use professional interview techniques (labeling, calibrated questions, mirroring, probing) across 12 interaction modes
Respect hiring thresholds precisely — STRONG_HIRE requires overall score >= 4.5, HIRE >= 3.5 on the 1-5 scale
Never leak answers or provide advice to candidates during live interviews (100% safety rate)

Model Versions

Version	Base Model	Released	Key Changes	Status
v1	Gemini 2.0 Flash	Feb 2026	Initial fine-tuning on extracted production data — LoRA rank 4, 3 epochs	Removed
v2	Gemini 2.0 Flash	Feb 2026	Production-format aligned training data, borderline calibration, LoRA rank 8, 5 epochs	Deprecated
v3	Gemini 2.5 Flash	Mar 2026	1-5 scale, evidence_components, transcript_turns, 12 interviewer modes, LoRA rank 16, 5 epochs	Live

v1 endpoints have been removed. v2 endpoints on Gemini 2.0 Flash are deprecated and will be decommissioned. All production traffic now uses v3 endpoints on Gemini 2.5 Flash.

Inference Performance

All three agents run on dedicated Vertex AI fine-tuned endpoints in us-central1. Latencies are observed from the agent pod to the Vertex AI endpoint and back — network round-trip included.

Agent	Typical Inference Time	Notes
Planner	20–40s per plan	Called once per interview — full plan generation including competency rubrics and sub-dimensions
Interviewer	2–5s per turn	Real-time constraint — called per candidate answer with full context
Assessor	25–40s per assessment	Called once post-interview — full transcript analysis with evidence components

The Interviewer LLM is the primary latency driver in the per-turn audio pipeline. Combined with Google STT (200-500ms) and Google TTS (300-600ms), total end-to-end audio round-trip is 1.4s - 3.2s per conversational turn — measured in production sessions.

Explore

Fine-Tuning Methodology

Training pipeline, data generation, hyperparameters, and the Vertex AI SFT workflow.

Model Specifications

Per-agent endpoint details, system prompt design, input/output schemas, and inference config.

Evaluation Results

v3 benchmark metrics comparing fine-tuned vs base Gemini 2.5 Flash.

Was this page helpful?

Previous← RBAC System

NextFine-Tuning Methodology →