Usability testing is the practice of watching real users attempt tasks with your product. It is the single most effective way to find usability problems before they reach production.
Why Usability Testing Works
Designers and developers suffer from the "curse of knowledge." You built the interface, so you know how it works. Usability testing breaks this curse by showing you what happens when someone who does not share your knowledge tries to use what you built.
Jakob Nielsen's research shows that 5 users find roughly 85% of usability problems. You do not need hundreds of participants — you need a handful of sessions and the willingness to watch and learn.
Task-Based Testing
Usability tests are built around tasks. A task is a realistic scenario that asks the participant to accomplish a goal:
Writing Good Tasks
| Poor Task | Better Task |
|---|---|
| "Find the settings page" | "You want to change your notification preferences. How would you do that?" |
| "Click the checkout button" | "You've decided to buy these two items. Complete your purchase." |
| "Use the search feature" | "You're looking for a winter jacket under $100. Find one you'd buy." |
Good tasks follow these principles:
- Goal-oriented — describe what the user wants to achieve, not which UI element to use
- Realistic — tasks should reflect real scenarios, not edge cases
- Specific enough — provide enough context for the participant to act
- No hints — do not mention UI element names or locations
Task Difficulty Progression
Start with an easy task to build confidence, then increase difficulty:
- Simple navigation task (warm-up)
- Core workflow task (primary focus)
- Complex multi-step task (stress test)
- Recovery task (handling errors or edge cases)
The Think-Aloud Protocol
The think-aloud protocol asks participants to verbalize their thoughts while completing tasks. It is the most widely used technique in usability testing:
"As you work through the tasks, please think out loud.
Tell me what you're looking at, what you're thinking,
and what you're trying to do. There are no wrong answers —
I'm testing the design, not you."What you learn from think-aloud:
- Expectations — "I'd expect this button to take me to..."
- Confusion — "I'm not sure what this means..."
- Decision-making — "I'm choosing this option because..."
- Satisfaction — "Oh, that was easy" or "That was frustrating"
When Participants Go Silent
Participants often forget to think aloud when they are concentrating. Gently prompt them:
- "What are you thinking right now?"
- "What are you looking at?"
- "What do you expect to happen?"
Do not ask these questions at critical moments when you want to observe natural behavior — just prompt when there has been extended silence.
Running a Session
Before the Session
- Prepare the prototype or product — ensure it is in the correct starting state
- Test your recording setup — screen recording, audio, and camera
- Print your script and tasks — have them accessible but do not read rigidly
- Set up the note-taking template — timestamps, tasks, observations, quotes
During the Session (60 minutes typical)
| Phase | Duration | Activities |
|---|---|---|
| Welcome | 5 min | Introductions, consent, explain the process |
| Background | 5 min | Brief questions about their experience level |
| Tasks | 35-40 min | 4-6 tasks with think-aloud |
| Debrief | 10 min | Overall impressions, questions, thank you |
Moderator Behaviors
Do:
- Stay neutral — do not react to mistakes or successes
- Let participants struggle — resist the urge to help
- Ask follow-up questions — "Why did you click there?" "What did you expect?"
- Note body language — sighs, hesitation, leaning forward
Do not:
- Say "You're doing great" (implies a right way to do it)
- Explain the interface when they are confused
- Ask leading questions — "Did you notice the menu at the top?"
- Rush through tasks to fit the schedule
When Participants Are Stuck
If a participant is completely stuck for over a minute:
- Ask what they would do if they were at home alone
- Ask them to try a different approach
- As a last resort, provide a small hint and note that you intervened
Never let frustration build to the point where the participant feels bad about themselves.
Remote vs In-Person
Remote Testing
Advantages:
- Recruit from anywhere — broader, more diverse participants
- Participants use their own devices and environment
- Easier to schedule
- Lower cost (no lab, no travel)
Tools: Zoom, UserTesting, Lookback, Maze
Challenges: harder to read body language, potential technical issues, less control over environment
In-Person Testing
Advantages:
- Better rapport with participants
- Full body language observation
- Easier to test physical products or complex prototypes
- Stakeholders can observe from behind a mirror or in another room
Challenges: limited to local participants, higher cost, requires a dedicated space
Unmoderated Remote Testing
Participants complete tasks on their own time with no moderator present. Tools like Maze or UserTesting record the screen and audio:
- Best for: large sample sizes, simple tasks, quantitative metrics
- Limitations: no follow-up questions, no ability to probe deeper, lower data quality per session
Measuring Results
Track both qualitative observations and quantitative metrics:
- Task success rate — percentage of participants who completed the task
- Time on task — how long the task took (compare across design iterations)
- Error rate — number of wrong clicks or wrong paths before success
- Severity ratings — rate each usability problem from cosmetic to critical
- SUS (System Usability Scale) — a standardized 10-question post-test questionnaire
A severity scale example:
| Rating | Severity | Description |
|---|---|---|
| 1 | Cosmetic | Minor issue, fix if time allows |
| 2 | Minor | Users are slightly delayed but recover quickly |
| 3 | Major | Users are significantly delayed or need help |
| 4 | Critical | Users cannot complete the task at all |
After the Session
Within 24 hours:
- Write a session debrief with top findings
- Clip key video moments (30-60 second clips are powerful for stakeholder buy-in)
- Log usability issues with severity ratings
- Note patterns emerging across sessions
After all sessions, compile a findings report with prioritized recommendations.