Study Design Guide

How to compose effective study scripts for UserTold.ai. Covers mode selection, segment sequencing, field-by-field guidance, anti-patterns, annotated examples, and the full SegmentV2 field reference.


Quick Schema Reference

Before writing a script, make sure you have the required fields for each segment mode.

FieldRequiredTypeNotes
versionyes2Must be exactly 2
goalsyesarray[{ id, description }] objects
segmentsyesarraySegment objects
segments[].idyesstringUnique within script
segments[].modeyesstringtalk | speak | observe
segments[].titleyesstringDisplay label
segments[].speak_textyes for speakstringSpoken text delivered by AI
segments[].talkrecommended for talkobject{ system_prompt?, goals? }
segments[].instructionrequired for observestringTask instruction shown to participant
segments[].conductor_contextrequired for observestringAI-only context for stuck detection

Missing speak_text on a speak segment or missing both instruction and conductor_context on an observe segment will fail validation.


Modes

Every segment runs in one of three modes. Picking the right mode is the single most important decision per segment.

talk — Conversational Interview

The AI conducts a natural voice conversation: asks questions, listens, follows up.

Use when you need:

  • Open-ended discovery (triggers, motivations, decision criteria)
  • Follow-up probing on specific answers
  • Rapport building at session start/end
  • Debrief after observation

Key fields: talk.system_prompt, talk.goals

The system_prompt shapes the interviewer's personality, question style, and focus. Goals tell the AI which research objectives this segment should pursue.

speak — AI Monologue

The AI delivers a scripted message, then waits or advances. No back-and-forth.

Use when you need:

  • Task instructions before an observe segment
  • Welcome/intro messages
  • Consent language or disclaimers
  • Transitions between study phases

Key fields: speak_text, speak_submode, speak_interruptible

Keep speak_text concise. Participants tune out after ~30 seconds of monologue.

observe — Silent Observation

The AI watches the participant use your product. It stays quiet unless the stuck detector fires.

Use when you need:

  • Usability testing (watch a task end-to-end)
  • Workflow observation (see how they actually work)
  • Any scenario where interruption would bias behavior

Key fields: instruction, conductor_context, max_duration_s, suppress_interventions

The instruction is shown to the participant. The conductor_context is AI-only background knowledge that helps the evaluator understand what's normal vs. stuck.


Segment Sequencing

Order matters. The right sequence produces richer data than any individual segment.

The Core Pattern: speak → observe → talk

Most usability studies follow this arc:

  1. speak — Set up the task ("I'd like you to complete a purchase...")
  2. observe — Watch them do it (silent, no leading)
  3. talk — Debrief on what happened ("What were you thinking when you paused on the payment page?")

Why this order works:

  • Speak gives clear instructions without a conversation that might bias behavior
  • Observe captures natural behavior before you ask about it
  • Talk references concrete moments the participant just experienced

JTBD Pattern: talk → talk → talk → talk

Pure interview. Each segment narrows focus:

  1. Rapport & context — Establish the recent event
  2. Trigger & job — What happened, what they needed
  3. Friction & workarounds — Where things broke, what they did instead
  4. Wrap-up — Confirm understanding, capture the key moment

Exploration Pattern: talk → observe → talk

Start with context, then watch, then probe:

  1. Context — Understand their routine and tools
  2. Demo — Watch them do the thing
  3. Probe — Dig into what you observed

Principles

  • Never start with observe. Participants need context first — at minimum a speak segment with instructions.
  • Never end with observe. Always debrief. The richest insights come from asking "why did you do X?" after watching X happen.
  • Use speak for transitions, not conversations. If you need back-and-forth, use talk.
  • Limit observe segments to 5-7 minutes. Beyond that, participants lose focus and data quality drops. Use max_duration_s.

Field-by-Field Guidance

system_prompt (talk mode)

The system_prompt defines the interviewer's behavior for a talk segment. It's the most impactful field in the entire script.

Good system_prompt traits:

  • Specifies question style (one question at a time, concrete, no leading)
  • Names the evidence to pursue (behaviors, not opinions)
  • Sets boundaries (no solution selling, no roadmap talk)
  • Includes 2-3 example follow-up questions

Example:

Ask only about concrete recent behavior, not hypothetical futures.
For each friction point mentioned, ask:
- "What did you click or type next?"
- "What did you expect to happen?"
- "What happened instead?"
Do not move on until you capture behavior + consequence.

Avoid:

  • Generic prompts ("Ask good questions about the user experience")
  • Long preambles that dilute the core instruction
  • Contradictory rules ("Be concise" + "Always ask 3 follow-ups")

conductor_context (observe mode)

Background knowledge for the AI evaluator — NOT shown to the participant.

Good conductor_context traits:

  • Describes expected vs. stuck behavior
  • Mentions UI elements that are commonly missed
  • Provides domain-specific context

Example:

The "Submit" button is below the fold on mobile. Users frequently scroll past it.
Expected flow: fill form → scroll down → tap Submit → see confirmation.
If the user scrolls up and down repeatedly, they are likely stuck.

Avoid:

  • Empty string (wastes an opportunity to help the evaluator)
  • Participant-facing language (this is AI-only)

instruction (observe mode)

Shown to the participant. Tells them what to do.

Good instruction traits:

  • One clear task
  • Concrete start and end points
  • No hints about how to complete it

Example: "Complete a purchase of any item, from product page through to the confirmation screen."

Avoid:

  • Multiple tasks in one instruction
  • Hints: "Click the blue button to check out" (biases behavior)
  • Vague: "Use the product" (no clear success criteria)

advance_when

Tells the conductor when to auto-advance to the next segment. Can be deterministic or LLM-judged.

Deterministic (URL-based):

url:https://example.com/confirmation

Advances when the participant navigates to a URL matching this prefix.

LLM-judged (natural language):

The participant has completed the checkout and sees a confirmation message.

The evaluator checks this condition periodically.

Tips:

  • Prefer URL-based when possible — it's instant and deterministic.
  • For talk segments, advance_when is rarely needed. Let the AI judge conversation completeness via goals.

goals (study-level and talk-level)

Study-level goals define what the entire study should learn. Talk-level goals (talk.goals) tell a specific segment which study goals to pursue.

Good goals:

  • Observable and specific: "Capture the exact trigger event and context"
  • Outcome-oriented: "Identify workarounds used when the primary flow fails"

Avoid:

  • Vague: "Understand the user" (understand what, specifically?)
  • Too many: 3-5 goals per study is ideal. More than 7 dilutes focus.
  • Duplicated: Don't repeat the same goal in every segment. Assign goals to the segments where they're most relevant.

max_duration_s

Safety valve for observe segments. Auto-advances after this many seconds.

Recommendations:

  • Observe segments: 300-420s (5-7 minutes)
  • Speak segments: rarely needed (they're short by nature)
  • Talk segments: rarely needed (conversation has natural endings)

suppress_interventions

Disables stuck detection for a segment. Use for think-aloud protocols where pauses and hesitation are expected and valuable, not signs of being stuck.

step_up_if

Natural language hint for when the conductor should intervene (even if the user isn't technically stuck).

Example: "The participant has been on the pricing page for more than 60 seconds without interacting."

skip_if

Natural language condition for skipping this segment entirely.

Example: "The participant already described their trigger event in the previous segment."


Common Anti-Patterns

1. All-talk studies with no observation

Problem: Five talk segments in a row. You're collecting opinions, not behavior. Fix: Add at least one observe segment. Watch them do the thing, then talk about it.

2. Missing conductor_context

Problem: Observe segment with empty conductor_context. The AI has no idea what "stuck" looks like for your specific task. Fix: Always describe expected behavior, common failure points, and what stuck looks like.

3. Vague goals

Problem: Goals like "Understand the user experience" or "Learn about needs." Fix: Make goals concrete and observable: "Capture the exact workaround used when export fails."

4. Starting with observe

Problem: First segment is observe. Participant has no idea what they're supposed to do. Fix: Always precede observe with speak (task instructions) or talk (context gathering).

5. No debrief after observe

Problem: Observe segment followed by session end. You captured behavior but never asked why. Fix: Always follow observe with a talk debrief that references what you just watched.

6. Giant system_prompt

Problem: 500-word system_prompt that tries to cover every scenario. The AI loses focus. Fix: Keep prompts to 3-5 concrete rules. Use talk.goals to direct focus rather than long prompts.

7. Too many goals

Problem: 10 goals across 3 segments. None get adequate coverage. Fix: 3-5 goals per study. Each goal assigned to 1-2 segments where it's most relevant.

8. No max_duration_s on observe

Problem: Observe segment runs indefinitely. Participant wanders for 15 minutes. Fix: Set max_duration_s to 300-420 for observe segments. Use advance_when for early completion.


Annotated Examples

JTBD Interview

Why this works: Pure talk study is appropriate here because JTBD is retrospective — you're asking about past behavior, not observing current behavior. Each segment narrows the aperture from context to trigger to friction to alternatives.

{
  "version": 2,
  "defaults": {
    "system_prompt": "You are a product interviewer. Ask one concrete question at a time. Prefer behavior evidence over opinions."
  },
  "goals": [
    { "id": "g_trigger", "description": "Capture the exact trigger event and context" },
    { "id": "g_outcome", "description": "Capture desired outcome and success criteria" },
    { "id": "g_friction", "description": "Capture specific friction points during execution" },
    { "id": "g_workaround", "description": "Capture workarounds or alternate paths used" },
    { "id": "g_decision", "description": "Capture tradeoffs and stop/go decisions" }
  ],
  "segments": [
    {
      "id": "seg_rapport",
      "title": "Rapport & Context",
      "mode": "talk",
      "talk": {
        "system_prompt": "Set a 1-sentence boundary and ask for the last real time the participant did the target task end-to-end.",
        "goals": ["g_trigger"]
      }
    },
    {
      "id": "seg_trigger",
      "title": "Trigger & Job",
      "mode": "talk",
      "talk": {
        "system_prompt": "Ask: what happened right before they started, what they needed done, what success meant. Follow with one strict probe per answer.",
        "goals": ["g_trigger", "g_outcome"]
      }
    },
    {
      "id": "seg_jtbd",
      "title": "Friction & Workarounds",
      "mode": "talk",
      "talk": {
        "system_prompt": "For every friction mention, ask: what did you click next, what obstacle appeared, what did you do to keep going, what did it cost? Do not move on until you capture behavior + consequence.",
        "goals": ["g_friction", "g_workaround", "g_decision"]
      }
    },
    {
      "id": "seg_compare",
      "title": "Alternatives",
      "mode": "talk",
      "talk": {
        "system_prompt": "Ask only about alternatives already used, not hypothetical futures. Push for one concrete example per alternative.",
        "goals": ["g_outcome", "g_decision"]
      }
    },
    {
      "id": "seg_wrap",
      "title": "Wrap Up",
      "mode": "talk",
      "talk": {
        "system_prompt": "Summarize what you heard. Ask: is there one concrete moment that shows what matters most?",
        "goals": ["g_decision"]
      }
    }
  ]
}

Design notes:

  • defaults.system_prompt sets the baseline interviewer persona — individual segments override with specific focus areas.
  • Goals are distributed across segments. g_trigger is covered in rapport and trigger segments; g_decision spans JTBD, compare, and wrap-up.
  • Each segment's talk.system_prompt includes concrete example questions, not just topic descriptions.

Usability Test

Why this works: The speak → observe → talk arc captures natural behavior (observe) bookended by setup (speak) and reflection (talk). The speak segment prevents bias by delivering instructions without conversation.

{
  "version": 2,
  "goals": [
    { "id": "g_completion", "description": "Evaluate task completion and ease of use" },
    { "id": "g_friction", "description": "Identify points of confusion or friction" }
  ],
  "segments": [
    {
      "id": "seg_intro",
      "title": "Task Instructions",
      "mode": "speak",
      "speak_text": "Hi there — thanks for joining. I'll give you a task to complete. While you work, speak out loud about what you're doing and anything that feels confusing. There are no right or wrong answers.",
      "speak_submode": "speak_balanced"
    },
    {
      "id": "seg_task",
      "title": "Complete the Task",
      "mode": "observe",
      "instruction": "Complete a purchase from product page through to confirmation.",
      "conductor_context": "Expected flow: browse → add to cart → checkout → payment → confirmation. The payment form requires scrolling on mobile. Users who tap 'Back' from payment often cannot find their cart again.",
      "max_duration_s": 420,
      "advance_when": "url:https://example.com/confirmation"
    },
    {
      "id": "seg_debrief",
      "title": "Debrief",
      "mode": "talk",
      "talk": {
        "system_prompt": "Ask what was easy, what was confusing, and what outcome they expected. Reference specific moments you observed. Follow with one concrete improvement question.",
        "goals": ["g_completion", "g_friction"]
      }
    }
  ]
}

Design notes:

  • conductor_context tells the evaluator what "stuck" looks like for this specific task — essential for good interventions.
  • advance_when uses URL matching for deterministic advancement when the task is done.
  • max_duration_s: 420 (7 minutes) prevents indefinite observation.
  • The debrief prompt says "reference specific moments" — the AI has the observation transcript and can ask about concrete behavior.

Exploration Study

Why this works: Starts with talk to understand context, moves to observe to see reality (not just what they say), then probes the gap between what they described and what you observed.

{
  "version": 2,
  "goals": [
    { "id": "g_workflow", "description": "Understand daily routines and workflows" },
    { "id": "g_needs", "description": "Discover unmet needs and workarounds" }
  ],
  "segments": [
    {
      "id": "seg_context",
      "title": "Context & Routine",
      "mode": "talk",
      "talk": {
        "system_prompt": "Explore the participant's daily routine, tools, habits. Ask about the last time they did the target task. Get concrete, recent examples — not general descriptions.",
        "goals": ["g_workflow"]
      }
    },
    {
      "id": "seg_demo",
      "title": "Show Current Workflow",
      "mode": "observe",
      "instruction": "Show how you currently do this task from start to finish.",
      "conductor_context": "We are watching their current workflow to identify friction and workarounds. Note any copy-paste between tools, manual steps that could be automated, or moments of hesitation.",
      "max_duration_s": 360
    },
    {
      "id": "seg_probe",
      "title": "Pain Points & Needs",
      "mode": "talk",
      "talk": {
        "system_prompt": "Dig into frustrations and workarounds observed in the demo. Ask: why do you do it that way? What breaks? What would you change? Look for unmet needs behind stated preferences.",
        "goals": ["g_needs"]
      }
    }
  ]
}

Design notes:

  • The talk → observe → talk pattern lets you compare what people say (context) with what they do (demo), then probe the differences.
  • conductor_context in the demo segment primes the AI to watch for specific signals (copy-paste, manual steps, hesitation).
  • The probe segment explicitly references the demo: "Dig into frustrations and workarounds observed."

SegmentV2 Field Reference

Required Fields

FieldTypeDescription
idstringUnique segment identifier (e.g. "seg_intro"). Used in advance_when, skip_if, and logging.
titlestringHuman-readable segment name. Shown in the dashboard and logs.
mode"talk" | "speak" | "observe"Interaction mode for this segment.

Talk Mode Fields

FieldTypeDefaultDescription
talk.system_promptstringInherits from defaults.system_promptInterviewer persona and question strategy for this segment.
talk.goalsstring[][]IDs of study goals this segment should pursue.
talk.toolsLLMTool[][]Custom tools available to the interviewer LLM.

Speak Mode Fields

FieldTypeDefaultDescription
speak_textstringText for the AI to speak. Required for speak segments.
speak_submode"speak_fast" | "speak_balanced" | "speak_rich""speak_balanced"Voice quality/speed tradeoff.
speak_interruptiblebooleantrueWhether the participant can interrupt by speaking.

Observe Mode Fields

FieldTypeDefaultDescription
instructionstringTask instruction shown to the participant.
conductor_contextstring""AI-only background knowledge for the evaluator/stuck detector.

Segment Flow Control

FieldTypeDefaultDescription
advance_whenstringWhen to auto-advance. url:<prefix> for deterministic, or natural language for LLM-judged.
skip_ifstringNatural language condition to skip this segment entirely.
step_up_ifstringNatural language hint for when to intervene proactively.
max_duration_snumberAuto-advance after this many seconds.
suppress_interventionsbooleanfalseDisable stuck detection (for think-aloud protocols).

Host Actions

FieldTypeDefaultDescription
host_actions_on_enterHostAction[][]Actions executed when entering this segment (e.g. navigate to URL).
host_actions_on_exitHostAction[][]Actions executed when leaving this segment.

StudyScriptV2 Top-Level Fields

FieldTypeDescription
version2Always 2.
segmentsSegmentV2[]Ordered list of segments.
goalsStudyGoal[]Study-level goals: { id: string, description: string }.
defaults.system_promptstringFallback system_prompt for talk segments that don't specify one.
defaults.voicestringDefault TTS voice.
defaults.speak_submodeSpeakSubmodeDefault speak quality/speed.
defaults.languagestringSession language hint.

See also

  • Studies — create and manage studies
  • Quickstart — end-to-end setup including study creation