Study Design Guide
How to compose effective study scripts for UserTold.ai. Covers mode selection, segment sequencing, field-by-field guidance, anti-patterns, annotated examples, and the full SegmentV2 field reference.
Quick Schema Reference
Before writing a script, make sure you have the required fields for each segment mode.
| Field | Required | Type | Notes |
|---|---|---|---|
version | yes | 2 | Must be exactly 2 |
goals | yes | array | [{ id, description }] objects |
segments | yes | array | Segment objects |
segments[].id | yes | string | Unique within script |
segments[].mode | yes | string | talk | speak | observe |
segments[].title | yes | string | Display label |
segments[].speak_text | yes for speak | string | Spoken text delivered by AI |
segments[].talk | recommended for talk | object | { system_prompt?, goals? } |
segments[].instruction | required for observe | string | Task instruction shown to participant |
segments[].conductor_context | required for observe | string | AI-only context for stuck detection |
Missing speak_text on a speak segment or missing both instruction and conductor_context on an observe segment will fail validation.
Modes
Every segment runs in one of three modes. Picking the right mode is the single most important decision per segment.
talk — Conversational Interview
The AI conducts a natural voice conversation: asks questions, listens, follows up.
Use when you need:
- Open-ended discovery (triggers, motivations, decision criteria)
- Follow-up probing on specific answers
- Rapport building at session start/end
- Debrief after observation
Key fields: talk.system_prompt, talk.goals
The system_prompt shapes the interviewer's personality, question style, and focus. Goals tell the AI which research objectives this segment should pursue.
speak — AI Monologue
The AI delivers a scripted message, then waits or advances. No back-and-forth.
Use when you need:
- Task instructions before an observe segment
- Welcome/intro messages
- Consent language or disclaimers
- Transitions between study phases
Key fields: speak_text, speak_submode, speak_interruptible
Keep speak_text concise. Participants tune out after ~30 seconds of monologue.
observe — Silent Observation
The AI watches the participant use your product. It stays quiet unless the stuck detector fires.
Use when you need:
- Usability testing (watch a task end-to-end)
- Workflow observation (see how they actually work)
- Any scenario where interruption would bias behavior
Key fields: instruction, conductor_context, max_duration_s, suppress_interventions
The instruction is shown to the participant. The conductor_context is AI-only background knowledge that helps the evaluator understand what's normal vs. stuck.
Segment Sequencing
Order matters. The right sequence produces richer data than any individual segment.
The Core Pattern: speak → observe → talk
Most usability studies follow this arc:
- speak — Set up the task ("I'd like you to complete a purchase...")
- observe — Watch them do it (silent, no leading)
- talk — Debrief on what happened ("What were you thinking when you paused on the payment page?")
Why this order works:
- Speak gives clear instructions without a conversation that might bias behavior
- Observe captures natural behavior before you ask about it
- Talk references concrete moments the participant just experienced
JTBD Pattern: talk → talk → talk → talk
Pure interview. Each segment narrows focus:
- Rapport & context — Establish the recent event
- Trigger & job — What happened, what they needed
- Friction & workarounds — Where things broke, what they did instead
- Wrap-up — Confirm understanding, capture the key moment
Exploration Pattern: talk → observe → talk
Start with context, then watch, then probe:
- Context — Understand their routine and tools
- Demo — Watch them do the thing
- Probe — Dig into what you observed
Principles
- Never start with observe. Participants need context first — at minimum a speak segment with instructions.
- Never end with observe. Always debrief. The richest insights come from asking "why did you do X?" after watching X happen.
- Use speak for transitions, not conversations. If you need back-and-forth, use talk.
- Limit observe segments to 5-7 minutes. Beyond that, participants lose focus and data quality drops. Use
max_duration_s.
Field-by-Field Guidance
system_prompt (talk mode)
The system_prompt defines the interviewer's behavior for a talk segment. It's the most impactful field in the entire script.
Good system_prompt traits:
- Specifies question style (one question at a time, concrete, no leading)
- Names the evidence to pursue (behaviors, not opinions)
- Sets boundaries (no solution selling, no roadmap talk)
- Includes 2-3 example follow-up questions
Example:
Ask only about concrete recent behavior, not hypothetical futures.
For each friction point mentioned, ask:
- "What did you click or type next?"
- "What did you expect to happen?"
- "What happened instead?"
Do not move on until you capture behavior + consequence.
Avoid:
- Generic prompts ("Ask good questions about the user experience")
- Long preambles that dilute the core instruction
- Contradictory rules ("Be concise" + "Always ask 3 follow-ups")
conductor_context (observe mode)
Background knowledge for the AI evaluator — NOT shown to the participant.
Good conductor_context traits:
- Describes expected vs. stuck behavior
- Mentions UI elements that are commonly missed
- Provides domain-specific context
Example:
The "Submit" button is below the fold on mobile. Users frequently scroll past it.
Expected flow: fill form → scroll down → tap Submit → see confirmation.
If the user scrolls up and down repeatedly, they are likely stuck.
Avoid:
- Empty string (wastes an opportunity to help the evaluator)
- Participant-facing language (this is AI-only)
instruction (observe mode)
Shown to the participant. Tells them what to do.
Good instruction traits:
- One clear task
- Concrete start and end points
- No hints about how to complete it
Example: "Complete a purchase of any item, from product page through to the confirmation screen."
Avoid:
- Multiple tasks in one instruction
- Hints: "Click the blue button to check out" (biases behavior)
- Vague: "Use the product" (no clear success criteria)
advance_when
Tells the conductor when to auto-advance to the next segment. Can be deterministic or LLM-judged.
Deterministic (URL-based):
url:https://example.com/confirmation
Advances when the participant navigates to a URL matching this prefix.
LLM-judged (natural language):
The participant has completed the checkout and sees a confirmation message.
The evaluator checks this condition periodically.
Tips:
- Prefer URL-based when possible — it's instant and deterministic.
- For talk segments, advance_when is rarely needed. Let the AI judge conversation completeness via goals.
goals (study-level and talk-level)
Study-level goals define what the entire study should learn. Talk-level goals (talk.goals) tell a specific segment which study goals to pursue.
Good goals:
- Observable and specific: "Capture the exact trigger event and context"
- Outcome-oriented: "Identify workarounds used when the primary flow fails"
Avoid:
- Vague: "Understand the user" (understand what, specifically?)
- Too many: 3-5 goals per study is ideal. More than 7 dilutes focus.
- Duplicated: Don't repeat the same goal in every segment. Assign goals to the segments where they're most relevant.
max_duration_s
Safety valve for observe segments. Auto-advances after this many seconds.
Recommendations:
- Observe segments: 300-420s (5-7 minutes)
- Speak segments: rarely needed (they're short by nature)
- Talk segments: rarely needed (conversation has natural endings)
suppress_interventions
Disables stuck detection for a segment. Use for think-aloud protocols where pauses and hesitation are expected and valuable, not signs of being stuck.
step_up_if
Natural language hint for when the conductor should intervene (even if the user isn't technically stuck).
Example: "The participant has been on the pricing page for more than 60 seconds without interacting."
skip_if
Natural language condition for skipping this segment entirely.
Example: "The participant already described their trigger event in the previous segment."
Common Anti-Patterns
1. All-talk studies with no observation
Problem: Five talk segments in a row. You're collecting opinions, not behavior. Fix: Add at least one observe segment. Watch them do the thing, then talk about it.
2. Missing conductor_context
Problem: Observe segment with empty conductor_context. The AI has no idea what "stuck" looks like for your specific task.
Fix: Always describe expected behavior, common failure points, and what stuck looks like.
3. Vague goals
Problem: Goals like "Understand the user experience" or "Learn about needs." Fix: Make goals concrete and observable: "Capture the exact workaround used when export fails."
4. Starting with observe
Problem: First segment is observe. Participant has no idea what they're supposed to do. Fix: Always precede observe with speak (task instructions) or talk (context gathering).
5. No debrief after observe
Problem: Observe segment followed by session end. You captured behavior but never asked why. Fix: Always follow observe with a talk debrief that references what you just watched.
6. Giant system_prompt
Problem: 500-word system_prompt that tries to cover every scenario. The AI loses focus.
Fix: Keep prompts to 3-5 concrete rules. Use talk.goals to direct focus rather than long prompts.
7. Too many goals
Problem: 10 goals across 3 segments. None get adequate coverage. Fix: 3-5 goals per study. Each goal assigned to 1-2 segments where it's most relevant.
8. No max_duration_s on observe
Problem: Observe segment runs indefinitely. Participant wanders for 15 minutes. Fix: Set max_duration_s to 300-420 for observe segments. Use advance_when for early completion.
Annotated Examples
JTBD Interview
Why this works: Pure talk study is appropriate here because JTBD is retrospective — you're asking about past behavior, not observing current behavior. Each segment narrows the aperture from context to trigger to friction to alternatives.
{
"version": 2,
"defaults": {
"system_prompt": "You are a product interviewer. Ask one concrete question at a time. Prefer behavior evidence over opinions."
},
"goals": [
{ "id": "g_trigger", "description": "Capture the exact trigger event and context" },
{ "id": "g_outcome", "description": "Capture desired outcome and success criteria" },
{ "id": "g_friction", "description": "Capture specific friction points during execution" },
{ "id": "g_workaround", "description": "Capture workarounds or alternate paths used" },
{ "id": "g_decision", "description": "Capture tradeoffs and stop/go decisions" }
],
"segments": [
{
"id": "seg_rapport",
"title": "Rapport & Context",
"mode": "talk",
"talk": {
"system_prompt": "Set a 1-sentence boundary and ask for the last real time the participant did the target task end-to-end.",
"goals": ["g_trigger"]
}
},
{
"id": "seg_trigger",
"title": "Trigger & Job",
"mode": "talk",
"talk": {
"system_prompt": "Ask: what happened right before they started, what they needed done, what success meant. Follow with one strict probe per answer.",
"goals": ["g_trigger", "g_outcome"]
}
},
{
"id": "seg_jtbd",
"title": "Friction & Workarounds",
"mode": "talk",
"talk": {
"system_prompt": "For every friction mention, ask: what did you click next, what obstacle appeared, what did you do to keep going, what did it cost? Do not move on until you capture behavior + consequence.",
"goals": ["g_friction", "g_workaround", "g_decision"]
}
},
{
"id": "seg_compare",
"title": "Alternatives",
"mode": "talk",
"talk": {
"system_prompt": "Ask only about alternatives already used, not hypothetical futures. Push for one concrete example per alternative.",
"goals": ["g_outcome", "g_decision"]
}
},
{
"id": "seg_wrap",
"title": "Wrap Up",
"mode": "talk",
"talk": {
"system_prompt": "Summarize what you heard. Ask: is there one concrete moment that shows what matters most?",
"goals": ["g_decision"]
}
}
]
}
Design notes:
defaults.system_promptsets the baseline interviewer persona — individual segments override with specific focus areas.- Goals are distributed across segments.
g_triggeris covered in rapport and trigger segments;g_decisionspans JTBD, compare, and wrap-up. - Each segment's
talk.system_promptincludes concrete example questions, not just topic descriptions.
Usability Test
Why this works: The speak → observe → talk arc captures natural behavior (observe) bookended by setup (speak) and reflection (talk). The speak segment prevents bias by delivering instructions without conversation.
{
"version": 2,
"goals": [
{ "id": "g_completion", "description": "Evaluate task completion and ease of use" },
{ "id": "g_friction", "description": "Identify points of confusion or friction" }
],
"segments": [
{
"id": "seg_intro",
"title": "Task Instructions",
"mode": "speak",
"speak_text": "Hi there — thanks for joining. I'll give you a task to complete. While you work, speak out loud about what you're doing and anything that feels confusing. There are no right or wrong answers.",
"speak_submode": "speak_balanced"
},
{
"id": "seg_task",
"title": "Complete the Task",
"mode": "observe",
"instruction": "Complete a purchase from product page through to confirmation.",
"conductor_context": "Expected flow: browse → add to cart → checkout → payment → confirmation. The payment form requires scrolling on mobile. Users who tap 'Back' from payment often cannot find their cart again.",
"max_duration_s": 420,
"advance_when": "url:https://example.com/confirmation"
},
{
"id": "seg_debrief",
"title": "Debrief",
"mode": "talk",
"talk": {
"system_prompt": "Ask what was easy, what was confusing, and what outcome they expected. Reference specific moments you observed. Follow with one concrete improvement question.",
"goals": ["g_completion", "g_friction"]
}
}
]
}
Design notes:
conductor_contexttells the evaluator what "stuck" looks like for this specific task — essential for good interventions.advance_whenuses URL matching for deterministic advancement when the task is done.max_duration_s: 420(7 minutes) prevents indefinite observation.- The debrief prompt says "reference specific moments" — the AI has the observation transcript and can ask about concrete behavior.
Exploration Study
Why this works: Starts with talk to understand context, moves to observe to see reality (not just what they say), then probes the gap between what they described and what you observed.
{
"version": 2,
"goals": [
{ "id": "g_workflow", "description": "Understand daily routines and workflows" },
{ "id": "g_needs", "description": "Discover unmet needs and workarounds" }
],
"segments": [
{
"id": "seg_context",
"title": "Context & Routine",
"mode": "talk",
"talk": {
"system_prompt": "Explore the participant's daily routine, tools, habits. Ask about the last time they did the target task. Get concrete, recent examples — not general descriptions.",
"goals": ["g_workflow"]
}
},
{
"id": "seg_demo",
"title": "Show Current Workflow",
"mode": "observe",
"instruction": "Show how you currently do this task from start to finish.",
"conductor_context": "We are watching their current workflow to identify friction and workarounds. Note any copy-paste between tools, manual steps that could be automated, or moments of hesitation.",
"max_duration_s": 360
},
{
"id": "seg_probe",
"title": "Pain Points & Needs",
"mode": "talk",
"talk": {
"system_prompt": "Dig into frustrations and workarounds observed in the demo. Ask: why do you do it that way? What breaks? What would you change? Look for unmet needs behind stated preferences.",
"goals": ["g_needs"]
}
}
]
}
Design notes:
- The talk → observe → talk pattern lets you compare what people say (context) with what they do (demo), then probe the differences.
conductor_contextin the demo segment primes the AI to watch for specific signals (copy-paste, manual steps, hesitation).- The probe segment explicitly references the demo: "Dig into frustrations and workarounds observed."
SegmentV2 Field Reference
Required Fields
| Field | Type | Description |
|---|---|---|
id | string | Unique segment identifier (e.g. "seg_intro"). Used in advance_when, skip_if, and logging. |
title | string | Human-readable segment name. Shown in the dashboard and logs. |
mode | "talk" | "speak" | "observe" | Interaction mode for this segment. |
Talk Mode Fields
| Field | Type | Default | Description |
|---|---|---|---|
talk.system_prompt | string | Inherits from defaults.system_prompt | Interviewer persona and question strategy for this segment. |
talk.goals | string[] | [] | IDs of study goals this segment should pursue. |
talk.tools | LLMTool[] | [] | Custom tools available to the interviewer LLM. |
Speak Mode Fields
| Field | Type | Default | Description |
|---|---|---|---|
speak_text | string | — | Text for the AI to speak. Required for speak segments. |
speak_submode | "speak_fast" | "speak_balanced" | "speak_rich" | "speak_balanced" | Voice quality/speed tradeoff. |
speak_interruptible | boolean | true | Whether the participant can interrupt by speaking. |
Observe Mode Fields
| Field | Type | Default | Description |
|---|---|---|---|
instruction | string | — | Task instruction shown to the participant. |
conductor_context | string | "" | AI-only background knowledge for the evaluator/stuck detector. |
Segment Flow Control
| Field | Type | Default | Description |
|---|---|---|---|
advance_when | string | — | When to auto-advance. url:<prefix> for deterministic, or natural language for LLM-judged. |
skip_if | string | — | Natural language condition to skip this segment entirely. |
step_up_if | string | — | Natural language hint for when to intervene proactively. |
max_duration_s | number | — | Auto-advance after this many seconds. |
suppress_interventions | boolean | false | Disable stuck detection (for think-aloud protocols). |
Host Actions
| Field | Type | Default | Description |
|---|---|---|---|
host_actions_on_enter | HostAction[] | [] | Actions executed when entering this segment (e.g. navigate to URL). |
host_actions_on_exit | HostAction[] | [] | Actions executed when leaving this segment. |
StudyScriptV2 Top-Level Fields
| Field | Type | Description |
|---|---|---|
version | 2 | Always 2. |
segments | SegmentV2[] | Ordered list of segments. |
goals | StudyGoal[] | Study-level goals: { id: string, description: string }. |
defaults.system_prompt | string | Fallback system_prompt for talk segments that don't specify one. |
defaults.voice | string | Default TTS voice. |
defaults.speak_submode | SpeakSubmode | Default speak quality/speed. |
defaults.language | string | Session language hint. |
See also
- Studies — create and manage studies
- Quickstart — end-to-end setup including study creation