AI Models
Model Specifications
Per-agent inference configuration, input/output schemas, and architecture details for Teamcast Maya's three fine-tuned agents on Gemini 2.5 Flash.
k8s/configmap.yaml and injected at deployment time — no hardcoded values in application code.Planner Agent
Inference Configuration
| Field | Value |
|---|---|
| Base model | Gemini 2.5 Flash (fine-tuned, v3) |
| LoRA adapter rank | 16 |
| Temperature | 0.7 |
| Max output tokens | 8,192 (core plan) + 2,048 (supplementary) |
| Response format | JSON (response_mime_type) |
| Concurrency | Two parallel async calls |
Split-Call Architecture
The planner makes two independent Vertex AI calls concurrently to reduce latency. Call A generates the core interview plan; Call B generates supplementary content. Both complete in parallel.
| Call | Outputs | Max Tokens |
|---|---|---|
| A — Core | Competencies with rubric sub-dimensions (1-5 scale indicators), questions, skills coverage map | 8,192 |
| B — Supplementary | Greeting script, candidate outreach draft, must-have and nice-to-have criteria | 2,048 |
Input Schema
{
"position": "Senior Backend Engineer",
"level": "senior",
"job_description": "We are looking for...",
"skills": ["Python", "PostgreSQL", "Redis"],
"candidate_name": "Jane Doe",
"resume_text": "5 years experience at...",
"duration_minutes": 45
}Output Schema (v3)
{
"competencies": [
{
"name": "System Design",
"weight": 0.3,
"is_required": true,
"minimum_acceptable_score": 3.0,
"sub_dimensions": [
{
"name": "Architecture Thinking",
"indicators": {
"1": "Cannot articulate basic system components",
"2": "Identifies components but misses interactions",
"3": "Describes reasonable architecture with trade-offs",
"4": "Strong architecture with clear scaling strategy",
"5": "Exceptional design with novel approaches"
}
}
],
"questions": ["Design a rate limiter at scale.", "..."]
}
],
"skills_coverage": { "Python": true, "PostgreSQL": true },
"greeting_script": "Welcome Jane, thanks for joining today...",
"inmail_draft": "Hi Jane, I reviewed your profile...",
"must_have": ["3+ years Python", "distributed systems experience"],
"nice_to_have": ["Kubernetes", "Go"]
}Assessor Agent
Inference Configuration
| Field | Value |
|---|---|
| Base model | Gemini 2.5 Flash (fine-tuned, v3) |
| LoRA adapter rank | 16 |
| Temperature | 0.1 (near-deterministic) |
| Max output tokens | 8,192 |
| Response format | JSON (response_mime_type) |
| Post-processing | Two-layer scoring + recommendation enforcement |
Two-Layer Scoring Architecture
Layer 1 (LLM): scores 5 evidence components per sub-dimension on a 1-5 scale. Layer 2 (deterministic): aggregates evidence components into agent_score, aggregated_score, and overall_score using scoring_engine.py. The LLM sets overall_score to 0.0 as a placeholder — it is always recomputed by Layer 2.
Recommendation Thresholds (1-5 Scale)
The assessor outputs a hiring recommendation label. A post-processing step overrides the model label if it conflicts with the overall_score computed by Layer 2.
| Recommendation | Overall Score | Additional |
|---|---|---|
| STRONG_HIRE | >= 4.5 | No required competency failures |
| HIRE | >= 3.5 | No required competency failures |
| MAYBE | >= 2.5 or has failures | |
| NO_HIRE | < 2.5 |
Output Schema (v3)
{
"overall_score": 0.0,
"competency_scores": [
{
"competency": "System Design",
"weight": 0.3,
"sub_dimension_scores": [
{
"sub_dimension": "Architecture Thinking",
"evidence_components": {
"completeness": 4,
"reasoning_clarity": 3,
"outcome_strength": 4,
"ownership": 5,
"evidence_confidence": 4
},
"transcript_turns": [5, 6, 12],
"exclusion_triggered": false,
"exclusion_reason": "",
"observations": "Candidate described a rate limiter with clear outcomes..."
}
],
"aggregated_score": 0.0
}
],
"question_scores": [
{
"question_id": "q1",
"question_text": "Design a rate limiter at scale.",
"score": 4,
"max_score": 5,
"feedback": "Strong architecture with clear scaling strategy",
"evidence_turns": [5, 6, 7]
}
],
"strengths": ["Clear communication", "Strong problem-solving"],
"weaknesses": ["Limited experience with advanced topics"],
"recommendation": "HIRE",
"required_competency_failures": [],
"summary": "Overall assessment addressing three core evaluation questions...",
"must_have_evaluations": [
{ "criterion": "5+ years Python", "met": true, "confidence": 0.9, "evidence": "..." }
],
"nice_to_have_evaluations": [
{ "criterion": "Kubernetes experience", "met": false, "confidence": 0.8, "evidence": "..." }
]
}Note: overall_score and aggregated_score are placeholders (0.0) in the LLM output. The evaluation engine computes them deterministically from evidence_components.
Interviewer Agent
Inference Configuration
| Field | Value |
|---|---|
| Base model | Gemini 2.5 Flash (fine-tuned, v3 — 500 examples across 12 modes) |
| LoRA adapter rank | 16 |
| Temperature | 0.7 |
| Max output tokens | 512 |
| Response format | Plain text |
| System prompt | Company/job context, progress block, evidence requirements, communication techniques |
12 Interaction Modes
The interviewer receives a [SYSTEM HINT] with each request indicating the interaction mode. Each mode triggers different response behavior:
| Mode | Behavior |
|---|---|
| greeting | Open the interview with context and first question |
| answer | Acknowledge candidate answer and transition to next question |
| follow_up | Probe deeper on a partial or surface-level answer |
| evidence_probe | Request specific evidence (metrics, outcomes, examples) |
| rephrase | Rephrase question when candidate seems confused |
| off_topic | Redirect candidate back to the interview topic |
| prompt_elaboration | Encourage candidate to elaborate on a brief answer |
| closing | Wrap up the interview professionally |
| time_warning | Alert candidate about remaining time |
| time_up | End the interview when time expires |
| candidate_stop | Handle candidate requesting to stop |
| interruption | Handle mid-answer transitions |
Real-Time Audio Pipeline
The interviewer operates in real-time during a live audio session. Each candidate utterance flows through the full pipeline before a response is delivered:
| Stage | Technology | Notes |
|---|---|---|
| Candidate audio capture | Browser audio (WebM/Opus) | 48kHz sample rate |
| Transport | WebSocket edge layer | Low-latency binary streaming |
| Speech-to-text | Google Cloud Speech-to-Text | Streaming recognition |
| Response generation | Vertex AI fine-tuned endpoint | Full conversation history + mode hint + evidence requirements |
| Text-to-speech | Gemini 2.5 Flash TTS (Vertex AI) | Kore voice, PCM L16 24kHz mono, emotion tags |
| Audio delivery | WebSocket to browser | Binary audio frames |
Output Behaviour
The interviewer outputs plain text — the next question or follow-up. The model is trained to:
- Use professional interview techniques (labeling, mirroring, calibrated questions, specificity probing, tactical empathy)
- Keep responses to 2-6 sentences (100% compliance in v3 evaluation)
- Never give advice, hints, or reveal correct answers (100% safety rate)
- Probe incomplete answers with targeted follow-up questions
- Transition naturally between competency areas
- Track evidence requirements and probe for missing evidence signals
Deprecated Endpoints
| Agent | Version | Base Model | Status |
|---|---|---|---|
| Planner | v2 | Gemini 2.0 Flash | Deprecated — replaced by v3 |
| Assessor | v2 | Gemini 2.0 Flash | Deprecated — replaced by v3 |
| Interviewer | v1 | Gemini 2.0 Flash | Deprecated — replaced by v3 |