Study Design Guide

How to compose effective study scripts for UserTold.ai. Covers mode selection, segment sequencing, field-by-field guidance, anti-patterns, annotated examples, and the full SegmentV2 field reference.

Quick Schema Reference

Before writing a script, make sure you have the required fields for each segment mode.

Field	Required	Type	Notes
`version`	yes	`2`	Must be exactly `2`
`goals`	yes	array	`[{ id, description }]` objects
`segments`	yes	array	Segment objects
`segments[].id`	yes	string	Unique within script
`segments[].mode`	yes	string	`talk` \| `speak` \| `observe`
`segments[].title`	yes	string	Display label
`segments[].speak_text`	yes for speak	string	Spoken text delivered by AI
`segments[].talk`	recommended for talk	object	`{ system_prompt?, goals? }`
`segments[].instruction`	this or `conductor_context` for observe	string	Task instruction shown to participant
`segments[].conductor_context`	this or `instruction` for observe	string	AI-only context for interpretation and later debrief

Missing speak_text on a speak segment or missing both instruction and conductor_context on an observe segment will fail validation. Production scripts only support deterministic advancement: max_duration_s, user Done / step_done, url:<substring>, action:<selector-or-pattern>, complete_segment for talk segments, and scripted speak completion.

Modes

Every segment runs in one of three modes. Picking the right mode is the single most important decision per segment.

talk — Conversational Interview

The AI conducts a natural voice conversation: asks questions, listens, follows up.

Use when you need:

Open-ended discovery (triggers, motivations, decision criteria)
Follow-up probing on specific answers
Rapport building at interview start/end
Debrief after observation

Key fields: talk.system_prompt, talk.goals

The system_prompt shapes the interviewer's personality, question style, and focus. Goals tell the AI which research objectives this segment should pursue.

speak — Scripted Transition

The AI delivers a scripted one-way transition message, then advances when the spoken line completes. No back-and-forth.

Use when you need:

Task instructions before an observe segment
Welcome/intro messages
Consent language or disclaimers
Transitions between study phases

Key fields: speak_text, speak_submode

Keep speak_text concise. Participants tune out after ~30 seconds of monologue. Speak mode is output-only: microphone VAD is not used to interrupt the assistant because it cannot reliably distinguish participant speech from assistant playback echo. If interruption is needed later, use an explicit participant control rather than automatic voice barge-in.

observe — Silent Observation

The AI watches the participant use your product. It stays quiet and preserves what the participant says and does.

Use when you need:

Usability testing (watch a task end-to-end)
Workflow observation (see how they actually work)
Any scenario where interruption would bias behavior

Key fields: instruction, conductor_context, max_duration_s, deterministic advance_when

The instruction is shown to the participant. The conductor_context is AI-only background knowledge that helps later interpretation and planned talk debriefs understand expected behavior and friction.

Segment Sequencing

Order matters. The right sequence produces richer data than any individual segment.

The Core Pattern: speak → observe → talk → speak/end

Most usability studies follow this arc:

speak — Set up the task ("I'd like you to complete a purchase...")
observe — Watch them do it (silent, no leading)
talk — Debrief on what happened ("What were you thinking when you paused on the payment page?")
speak/end — Thank the participant and close the interview

Why this order works:

Speak gives clear instructions without a conversation that might bias behavior
Observe captures natural behavior before you ask about it
Talk references concrete moments the participant just experienced
The closing speak segment ends cleanly without starting a new conversation

Talk-Only Pattern: talk → talk → talk → talk

Pure interview. Each segment narrows focus:

Rapport & context — Establish the recent event
Trigger & need — What happened, what they needed
Friction & workarounds — Where things broke, what they did instead
Wrap-up — Confirm understanding, capture the key moment

Exploration Pattern: talk → observe → talk

Start with context, then watch, then probe:

Context — Understand their routine and tools
Demo — Watch them do the thing
Probe — Dig into what you observed

Principles

Never start with observe. Participants need context first — at minimum a speak segment with instructions.
Never end with observe. Always debrief. The richest insights come from asking "why did you do X?" after watching X happen.
Use speak for transitions, not conversations. If you need back-and-forth, use talk.
Limit observe segments to 5-7 minutes. Beyond that, participants lose focus and data quality drops. Use max_duration_s.

Field-by-Field Guidance

system_prompt (talk mode)

The system_prompt defines the interviewer's behavior for a talk segment. It's the most impactful field in the entire script.

Good system_prompt traits:

Specifies question style (one question at a time, concrete, no leading)
Names the evidence to pursue (behaviors, not opinions)
Sets boundaries (no solution selling, no roadmap talk)
Includes 2-3 example follow-up questions

Example:

Ask only about concrete recent behavior, not hypothetical futures.
For each friction point mentioned, ask:
- "What did you click or type next?"
- "What did you expect to happen?"
- "What happened instead?"
Do not move on until you capture behavior + consequence.

Avoid:

Generic prompts ("Ask good questions about the user experience")
Long preambles that dilute the core instruction
Contradictory rules ("Be concise" + "Always ask 3 follow-ups")

conductor_context (observe mode)

Background knowledge for interpretation and planned talk debriefs — NOT shown to the participant.

Good conductor_context traits:

Describes expected vs. stuck behavior
Mentions UI elements that are commonly missed
Provides domain-specific context

Example:

The "Submit" button is below the fold on mobile. Users frequently scroll past it.
Expected flow: fill form → scroll down → tap Submit → see confirmation.
If the user scrolls up and down repeatedly, they are likely stuck.

Avoid:

Empty string (wastes an opportunity to preserve useful interpretation context)
Participant-facing language (this is AI-only)

instruction (observe mode)

Shown to the participant. Tells them what to do.

Good instruction traits:

One clear task
Concrete start and end points
No hints about how to complete it

Example: "Complete a purchase of any item, from product page through to the confirmation screen."

Avoid:

Multiple tasks in one instruction
Hints: "Click the blue button to check out" (biases behavior)
Vague: "Use the product" (no clear success criteria)

advance_when

Tells the conductor when to auto-advance to the next segment. Production scripts only accept deterministic rules.

Supported rules:

url:https://example.com/confirmation
action:#submit-order

URL rules advance when the participant navigates to a URL containing the value. Action rules advance when a matching participant action is observed. Goals guide analysis and planned debriefs, not live advancement.

Tips:

Prefer URL-based or action-based rules when possible.
For talk segments, prefer the built-in complete_segment tool over advance_when.

goals (study-level and talk-level)

Study-level goals define what the entire study should learn. Talk-level goals (talk.goals) tell a specific segment which study goals to pursue.

Good goals:

Observable and specific: "Capture the exact trigger event and context"
Outcome-oriented: "Identify workarounds used when the primary flow fails"

Avoid:

Vague: "Understand the user" (understand what, specifically?)
Too many: 3-5 goals per study is ideal. More than 7 dilutes focus.
Duplicated: Don't repeat the same goal in every segment. Assign goals to the segments where they're most relevant.

max_duration_s

Safety valve for observe segments. Auto-advances after this many seconds.

Recommendations:

Observe segments: 300-420s (5-7 minutes)
Speak segments: rarely needed (they're short by nature)
Talk segments: rarely needed (conversation has natural endings)

skip_if

Natural language condition for skipping this segment entirely.

Example: "The participant already described their trigger event in the previous segment."

Common Anti-Patterns

1. All-talk studies with no observation

Problem: Five talk segments in a row. You're collecting opinions, not behavior. Fix: Add at least one observe segment. Watch them do the thing, then talk about it.

2. Missing conductor_context

Problem: Observe segment with empty conductor_context. The AI has no idea what "stuck" looks like for your specific task. Fix: Always describe expected behavior, common failure points, and what stuck looks like.

3. Vague goals

Problem: Goals like "Understand the user experience" or "Learn about needs." Fix: Make goals concrete and observable: "Capture the exact workaround used when export fails."

4. Starting with observe

Problem: First segment is observe. Participant has no idea what they're supposed to do. Fix: Always precede observe with speak (task instructions) or talk (context gathering).

5. No debrief after observe

Problem: Observe segment followed by interview end. You captured behavior but never asked why. Fix: Always follow observe with a talk debrief that references what you just watched.

6. Giant system_prompt

Problem: 500-word system_prompt that tries to cover every scenario. The AI loses focus. Fix: Keep prompts to 3-5 concrete rules. Use talk.goals to direct focus rather than long prompts.

7. Too many goals

Problem: 10 goals across 3 segments. None get adequate coverage. Fix: 3-5 goals per study. Each goal assigned to 1-2 segments where it's most relevant.

8. No max_duration_s on observe

Problem: Observe segment runs indefinitely. Participant wanders for 15 minutes. Fix: Set max_duration_s to 300-420 for observe segments. Use a deterministic advance_when rule for early completion.

Annotated Examples

Talk Interview

Why this works: Pure talk study is appropriate when you are asking about past behavior instead of observing current behavior. Each segment narrows the aperture from context to trigger to friction to alternatives.

{
  "version": 2,
  "defaults": {
    "system_prompt": "You are a product interviewer. Ask one concrete question at a time. Prefer behavior evidence over opinions."
  },
  "goals": [
    { "id": "g_trigger", "description": "Capture the exact trigger event and context" },
    { "id": "g_outcome", "description": "Capture desired outcome and success criteria" },
    { "id": "g_friction", "description": "Capture specific friction points during execution" },
    { "id": "g_workaround", "description": "Capture workarounds or alternate paths used" },
    { "id": "g_decision", "description": "Capture tradeoffs and stop/go decisions" }
  ],
  "segments": [
    {
      "id": "seg_rapport",
      "title": "Rapport & Context",
      "mode": "talk",
      "talk": {
        "system_prompt": "Set a 1-sentence boundary and ask for the last real time the participant did the target task end-to-end.",
        "goals": ["g_trigger"]
      }
    },
    {
      "id": "seg_trigger",
      "title": "Trigger & Job",
      "mode": "talk",
      "talk": {
        "system_prompt": "Ask: what happened right before they started, what they needed done, what success meant. Follow with one strict probe per answer.",
        "goals": ["g_trigger", "g_outcome"]
      }
    },
    {
      "id": "seg_friction",
      "title": "Friction & Workarounds",
      "mode": "talk",
      "talk": {
        "system_prompt": "For every friction mention, ask: what did you click next, what obstacle appeared, what did you do to keep going, what did it cost? Do not move on until you capture behavior + consequence.",
        "goals": ["g_friction", "g_workaround", "g_decision"]
      }
    },
    {
      "id": "seg_compare",
      "title": "Alternatives",
      "mode": "talk",
      "talk": {
        "system_prompt": "Ask only about alternatives already used, not hypothetical futures. Push for one concrete example per alternative.",
        "goals": ["g_outcome", "g_decision"]
      }
    },
    {
      "id": "seg_wrap",
      "title": "Wrap Up",
      "mode": "talk",
      "talk": {
        "system_prompt": "Summarize what you heard. Ask: is there one concrete moment that shows what matters most?",
        "goals": ["g_decision"]
      }
    }
  ]
}

Design notes:

defaults.system_prompt sets the baseline interviewer persona — individual segments override with specific focus areas.
Goals are distributed across segments. g_trigger is covered in rapport and trigger segments; g_decision spans friction, compare, and wrap-up.
Each segment's talk.system_prompt includes concrete example questions, not just topic descriptions.

Usability Test

Why this works: The speak → observe → talk → speak arc captures natural behavior (observe) with clear setup, focused reflection, and a scripted close. The opening speak segment prevents bias by delivering instructions without conversation.

{
  "version": 2,
  "goals": [
    { "id": "g_completion", "description": "Evaluate task completion and ease of use" },
    { "id": "g_friction", "description": "Identify points of confusion or friction" }
  ],
  "segments": [
    {
      "id": "seg_intro",
      "title": "Task Instructions",
      "mode": "speak",
      "speak_text": "Hi there — thanks for joining. I'll give you a task to complete. While you work, speak out loud about what you're doing and anything that feels confusing. There are no right or wrong answers.",
      "speak_submode": "speak_balanced"
    },
    {
      "id": "seg_task",
      "title": "Complete the Task",
      "mode": "observe",
      "instruction": "Complete a purchase from product page through to confirmation.",
      "conductor_context": "Expected flow: browse → add to cart → checkout → payment → confirmation. The payment form requires scrolling on mobile. Users who tap 'Back' from payment often cannot find their cart again.",
      "max_duration_s": 420,
      "advance_when": "url:https://example.com/confirmation"
    },
    {
      "id": "seg_debrief",
      "title": "Debrief",
      "mode": "talk",
      "talk": {
        "system_prompt": "Ask what was easy, what was confusing, and what outcome they expected. Reference specific moments you observed. Follow with one concrete improvement question.",
        "goals": ["g_completion", "g_friction"]
      }
    },
    {
      "id": "seg_thanks",
      "title": "Thanks",
      "mode": "speak",
      "speak_text": "Thanks for walking through that task and sharing your thoughts."
    }
  ]
}

Design notes:

conductor_context tells interpretation and debrief what friction can mean in this specific task.
advance_when uses URL matching for deterministic advancement when the task is done.
max_duration_s: 420 (7 minutes) prevents indefinite observation.
The debrief prompt says "reference specific moments" — the AI has the observation transcript and can ask about concrete behavior.

Exploration Study

Why this works: Starts with talk to understand context, moves to observe to see reality (not just what they say), then probes the gap between what they described and what you observed.

{
  "version": 2,
  "goals": [
    { "id": "g_workflow", "description": "Understand daily routines and workflows" },
    { "id": "g_needs", "description": "Discover unmet needs and workarounds" }
  ],
  "segments": [
    {
      "id": "seg_context",
      "title": "Context & Routine",
      "mode": "talk",
      "talk": {
        "system_prompt": "Explore the participant's daily routine, tools, habits. Ask about the last time they did the target task. Get concrete, recent examples — not general descriptions.",
        "goals": ["g_workflow"]
      }
    },
    {
      "id": "seg_demo",
      "title": "Show Current Workflow",
      "mode": "observe",
      "instruction": "Show how you currently do this task from start to finish.",
      "conductor_context": "We are watching their current workflow to identify friction and workarounds. Note any copy-paste between tools, manual steps that could be automated, or moments of hesitation.",
      "max_duration_s": 360
    },
    {
      "id": "seg_probe",
      "title": "Pain Points & Needs",
      "mode": "talk",
      "talk": {
        "system_prompt": "Dig into frustrations and workarounds observed in the demo. Ask: why do you do it that way? What breaks? What would you change? Look for unmet needs behind stated preferences.",
        "goals": ["g_needs"]
      }
    }
  ]
}

Design notes:

The talk → observe → talk pattern lets you compare what people say (context) with what they do (demo), then probe the differences.
conductor_context in the demo segment primes the AI to watch for specific signals (copy-paste, manual steps, hesitation).
The probe segment explicitly references the demo: "Dig into frustrations and workarounds observed."

SegmentV2 Field Reference

Required Fields

Field	Type	Description
`id`	string	Unique segment identifier (e.g. "seg_intro"). Used in deterministic advancement, skip conditions, and logging.
`title`	string	Human-readable segment name. Shown in the dashboard and logs.
`mode`	"talk" \| "speak" \| "observe"	Interaction mode for this segment.
`experimental_capabilities`	object	Optional segment-level widget runtime experiments. Values override study-level defaults for the active segment and still require deployment-stage allowlist support. Supported flags: `realtime_reasoning`, `realtime_tracing`, `realtime_sideband`, `realtime_mcp_connectors`, `realtime_analysis`, `realtime_duplex`, `realtime_page_context`, `realtime_visual_snapshots`, `realtime_visual_guidance`, `realtime_same_origin_navigation`. `realtime_duplex` enables guarded duplex/barge-in mode independently of the selected voice model.

Talk Mode Fields

realtime_duplex is a conversation-mode request for guarded participant barge-in. A resolved value of true means the study requested the experiment and the deployment allowlist permits it; the widget then keeps the mic armed during assistant output and cancels that output only after participant VAD passes a short echo-suppression window. Provider auto-interruption and auto-response creation stay disabled so they cannot race the guard: the widget rejects suppressed echo items and creates a response only after an accepted participant turn is committed. Protected greetings remain mic-muted. The flag is independent of voice-model selection and of the site-control capabilities (realtime_page_context, realtime_visual_snapshots, realtime_visual_guidance, and realtime_same_origin_navigation). Without the resolved capability, talk remains half-duplex.

Field	Type	Default	Description
`talk.system_prompt`	string	Inherits from `defaults.system_prompt`	Interviewer persona and question strategy for this segment.
`talk.goals`	string[]	[]	IDs of study goals this segment should pursue.
`talk.tools`	LLMTool[]	[]	Custom tools available to the interviewer LLM.
`talk.realtime_model`	supported Realtime model	Inherits from `defaults.realtime_model`	Optional pinned model override for this talk segment.
`realtime_navigation.allowed_paths`	string[]	[]	Same-origin path or wildcard route allowlist for `realtime_same_origin_navigation`. Required before enabling that capability.

Speak Mode Fields

Field	Type	Default	Description
`speak_text`	string	—	Text for the AI to speak. Required for speak segments.
`speak_submode`	"speak_fast" \| "speak_balanced" \| "speak_rich"	"speak_balanced"	Voice quality/speed tradeoff.
`speak_interruptible`	boolean	false	Deprecated legacy field. Ignored by the runtime; speak segments are output-only.

Observe Mode Fields

Field	Type	Default	Description
`instruction`	string	—	Task instruction shown to the participant.
`conductor_context`	string	""	AI-only background knowledge for interpretation and planned debriefs.

Segment Flow Control

Field	Type	Default	Description
`advance_when`	string	—	Deterministic auto-advance rule: `url:<substring>` or `action:<selector-or-pattern>`.
`skip_if`	string	—	Condition to skip this segment entirely.
`max_duration_s`	number	—	Auto-advance after this many seconds.

Host Actions

Field	Type	Default	Description
`host_actions_on_enter`	HostAction[]	[]	Scripted host-page actions executed when entering this segment: `highlight`, `navigate`, or `scroll_to`.
`host_actions_on_exit`	HostAction[]	[]	Scripted host-page actions executed when leaving this segment.

Host actions are deterministic segment-boundary steps. Scripted highlight and scroll_to actions use the selectors in their action payloads; do not rely on allowed_selectors as widget-runtime enforcement for those actions. Study-level allowed_origins allows conductor runtime requests from those origins; it does not authorize cross-origin host navigation. When allowed_origins is non-empty, include the embed origin that runs the widget. Scripted navigate actions should target the current host origin. Observation still stays silent; advancement uses max_duration_s, user Done / step_done, url:<substring>, action:<selector-or-pattern>, complete_segment for talk, or scripted speak completion.

StudyScriptV2 Top-Level Fields

Field	Type	Description
`version`	2	Always 2.
`segments`	SegmentV2[]	Ordered list of segments.
`goals`	StudyGoal[]	Study-level goals: `{ id: string, description: string }`.
`experimental_capabilities`	object	Optional widget runtime experiments requested for this study. Flags are disabled by default and only activate when the deployment stage allowlist also permits them. Supported top-level flags: `realtime_reasoning`, `realtime_tracing`, `realtime_sideband`, `realtime_mcp_connectors`, `realtime_analysis`, `realtime_duplex`, `realtime_page_context`, `realtime_visual_snapshots`, `realtime_visual_guidance`. `realtime_duplex` enables guarded duplex/barge-in mode independently of voice-model selection. `realtime_same_origin_navigation` must be configured on individual talk segments with `realtime_navigation.allowed_paths`.
`defaults.system_prompt`	string	Fallback system_prompt for talk segments that don't specify one.
`defaults.voice`	string	Default TTS voice.
`defaults.speak_submode`	SpeakSubmode	Default speak quality/speed.
`defaults.language`	string	Interview language hint.
`defaults.realtime_model`	supported Realtime model	Default pinned model for talk segments. When omitted, the platform default is used.

Study Design Guide

Quick Schema Reference

Modes

talk — Conversational Interview

speak — Scripted Transition

observe — Silent Observation

Segment Sequencing

The Core Pattern: speak → observe → talk → speak/end

Talk-Only Pattern: talk → talk → talk → talk

Exploration Pattern: talk → observe → talk

Principles

Field-by-Field Guidance

system_prompt (talk mode)

conductor_context (observe mode)

instruction (observe mode)

advance_when

goals (study-level and talk-level)

max_duration_s

skip_if

Common Anti-Patterns

1. All-talk studies with no observation

2. Missing conductor_context

3. Vague goals

4. Starting with observe

5. No debrief after observe

6. Giant system_prompt

7. Too many goals

8. No max_duration_s on observe

Annotated Examples

Talk Interview

Usability Test

Exploration Study

SegmentV2 Field Reference

Required Fields

Talk Mode Fields

Speak Mode Fields

Observe Mode Fields

Segment Flow Control

Host Actions

StudyScriptV2 Top-Level Fields

See also