Antoine Pezé

How to run a Heuristic Evaluation


TL;DR

Summary

A heuristic evaluation is an interface inspection method where a product is analyzed against a grid of established usability criteria. It quickly identifies the major ergonomic issues without recruiting users.

Goals

Detect usability issues in an interface systematically, prioritize them by severity, and produce an audit report that the product team can act on.


What is a heuristic evaluation?

A heuristic evaluation is a usability inspection method where experts review an interface against a set of established ergonomic principles, called heuristics.

The method was formalized by Jakob Nielsen and Rolf Molich in 1990, then refined by Nielsen in 1994 with the publication of his 10 usability heuristics. The principle is simple: a small group of expert evaluators reviews an interface using established design principles. Each violation of a principle is logged, described, and ranked by severity.

The goal is to give the team a fast, structured diagnosis of usability issues, without having to recruit users. In practice, a well-run heuristic evaluation can detect between 60 and 80% of an interface’s major usability issues.

Why use this method?

Several situations make heuristic evaluation particularly relevant:

  • At the start of a project: you inherit an existing product and need a quick state of play before planning user research.
  • When the budget is tight: you can’t afford to recruit users for tests, but you still need to identify the most critical issues.
  • Before a user test: by fixing the most obvious issues upfront, you avoid having your tests “polluted” by trivial errors, and you focus sessions on the real questions.
  • As part of a redesign: to document the weaknesses of the current interface and justify redesign choices to stakeholders.

What heuristic evaluation is not

Don’t confuse heuristic evaluation with user testing. In a heuristic evaluation, experts inspect the interface. In a user test, real users interact with the product. The two methods are complementary: heuristic evaluation detects ergonomic principle violations. User testing reveals real behaviors and misunderstandings that even an expert can’t anticipate.


Nielsen’s 10 heuristics

Jakob Nielsen formalized 10 general principles of interface design. Each covers a fundamental aspect of usability. Here are these 10 heuristics, explained with concrete examples.

1. Visibility of system status

The system should always keep users informed of what is happening through appropriate feedback within reasonable time.

Concrete examples:

  • A progress bar while a file is downloading.
  • A “Saving…” indicator when the user saves a document in Google Docs.
  • A notification badge on the messaging icon showing the number of unread messages.

Typical violation: a form that submits without any visual feedback, leaving the user to wonder if their action was received.

2. Match between the system and the real world

The system should speak the user’s language, with words, phrases, and concepts familiar to them, rather than system-oriented terms. Real-world conventions should be followed, and information should appear in a natural and logical order.

Concrete examples:

  • An e-commerce app uses “Cart” rather than “Order queue.”
  • A mail app shows messages from newest to oldest, like a stack of mail.
  • A digital calendar reuses the visual cues of a paper agenda.

Typical violation: an error message that displays “Error 500: Internal Server Error” instead of “Something went wrong, please try again in a few moments.”

3. User control and freedom

Users often pick functions by mistake and need a clearly marked “emergency exit” to leave the unwanted state, without having to go through an extended process. The system should support undo and redo.

Concrete examples:

  • The Ctrl+Z shortcut to undo an action in any editor.
  • A visible “Back” button at every step of a checkout funnel.
  • The ability to recover a deleted email from the trash for 30 days.

Typical violation: a 5-step signup process with no way to go back and edit information entered earlier.

4. Consistency and standards

Users shouldn’t have to wonder whether different words, situations, or actions mean the same thing. Follow platform and industry conventions.

Concrete examples:

  • All primary action buttons use the same color across the app.
  • The logo at the top left always returns to the home page.
  • Hyperlinks are underlined or in a color distinct from the body text.

Typical violation: a mobile app where the swipe-left gesture deletes an item on one screen, but archives an item on another screen.

5. Error prevention

Better than designing good error messages is preventing errors from happening in the first place. Eliminate error-prone conditions, or check for them and present users with a confirmation option before they commit to an action.

Concrete examples:

  • A date field with a calendar picker rather than a free-text input.
  • Disabling the “Submit” button until all required fields are filled.
  • A “Are you sure you want to delete this item?” confirmation before a destructive action.

Typical violation: a form that accepts any phone number format and only returns an error after submission.

6. Recognition rather than recall

Minimize the user’s memory load by making objects, actions, and options visible. The user shouldn’t have to remember information from one screen to another.

Concrete examples:

  • Recent search history shown in a search engine.
  • A checkout funnel that shows the order summary at every step.
  • Autocomplete suggestions in a search field.

Typical violation: a multi-step setup process where the user has to remember choices made in earlier steps without any visible recap.

7. Flexibility and efficiency of use

Accelerators, invisible to the novice user, can speed up interaction for the expert user. The system should serve novice and expert users alike.

Concrete examples:

  • Keyboard shortcuts in Figma or Photoshop.
  • The ability to create reusable templates in an emailing tool.
  • Slash commands in Notion or Slack.

Typical violation: a project management app where every task must be created via a 4-step form, with no quick-entry option.

8. Aesthetic and minimalist design

Dialogues should not contain irrelevant or rarely needed information. Every extra unit of information competes with the relevant ones and reduces their relative visibility.

Concrete examples:

  • The Google home page: a logo, a search field, two buttons.
  • A dashboard that surfaces key indicators in the foreground and pushes details to secondary pages.
  • A signup form that only asks for email and password, with no extra fields.

Typical violation: a home page packed with promotional banners, widgets, and secondary content that drown out the main action expected from the user.

9. Help users recognize, diagnose, and recover from errors

Error messages should be expressed in plain language (no codes), precisely indicate the problem, and constructively suggest a solution.

Concrete examples:

  • “This password is too short. Use at least 8 characters, including a number and an uppercase letter.”
  • “This email is already in use. Would you like to log in or reset your password?”
  • A form field that turns red with an explanatory message as soon as the user leaves the field.

Typical violation: a “Validation error” message with no indication of the affected field or the nature of the problem.

10. Help and documentation

Even if the system is ideally usable without documentation, it may be necessary to provide help. Such information should be easy to find, focused on the user’s task, list concrete steps, and not be too long.

Concrete examples:

  • Tooltips when hovering over icons in a toolbar.
  • A help center with a search bar and articles organized by topic.
  • An onboarding assistant that guides the user during their first use.

Typical violation: a complex app with no contextual help, with only a 200-page PDF as documentation.


The Bastien-Scapin criteria: the French alternative

Alongside Nielsen’s work, French researchers Christian Bastien and Dominique Scapin from INRIA proposed in 1993 an alternative grid made of 8 ergonomic criteria. These criteria are particularly used in French-speaking environments and offer finer detail than Nielsen’s heuristics, with 18 sub-criteria in total.

1. Guidance

All means used to advise, orient, inform, and guide the user when interacting with the system.

Sub-criteria:

  • Prompting: the elements that guide the user toward expected actions (labels, instructions, input examples).
  • Grouping/Distinction of items: the visual organization of elements by location and format (proximity, separators, colors).
  • Immediate feedback: the system’s response to each user action.
  • Legibility: the lexical characteristics of information presentation (character size, spacing, contrast).

2. Workload

Reducing the user’s perceptual, mnemonic, and physical workload.

Sub-criteria:

  • Brevity: limiting input and reading work (default values, allowed abbreviations).
  • Information density: don’t overload the screen with unnecessary information.

3. Explicit control

The control the user has over the processing of their actions.

Sub-criteria:

  • Explicit user actions: the system only executes actions the user requests, and only when they request them.
  • User control: the user must be able to interrupt, cancel, resume, or abandon a process underway.

4. Adaptability

The system’s ability to adapt to the user’s context and needs.

Sub-criteria:

  • Flexibility: the different ways to reach the same goal.
  • Account for user experience: the system adapts to the user’s level of expertise (novice vs. expert).

5. Error management

The mechanisms to prevent, detect, and recover from errors.

Sub-criteria:

  • Error protection: prevent errors from occurring.
  • Quality of error messages: relevance, legibility, and precision of messages.
  • Error correction: the means provided to correct errors.

6. Consistency

The constancy of design choices (codes, names, formats, procedures) from one screen to another.

7. Significance of codes and names

The match between an object or piece of information displayed and its referent. Codes and names should be meaningful to the user.

8. Compatibility

The match between user characteristics (memory, perception, habits) and the system’s interaction organization (inputs, outputs, dialogues).

Nielsen or Bastien-Scapin?

Both grids are valid. My take: use Nielsen if you work in an international context or with a team used to anglo-saxon vocabulary. Use Bastien-Scapin if you work in a French-speaking academic context or if you need finer granularity, especially on guidance and workload. In practice, I often start with Nielsen for its simplicity and complete with Bastien-Scapin when I need to dig into a specific aspect.


How to run a heuristic evaluation step by step

Step 1: Define the evaluation scope (1 to 2 hours)

Before starting, precisely scope what you’ll evaluate. A full audit of an entire product is rarely useful. Focus on critical journeys.

  1. List the main user journeys of the product (signup, purchase, search, configuration, etc.).
  2. Identify the 3 to 5 journeys that are most critical for the business or most used by your users.
  3. For each journey, list the screens that compose it.
  4. Document these journeys in a table or in Miro so each evaluator follows the same path.

By the end of this step, you should have a clear list of screens and journeys to evaluate, shared with all evaluators.

Step 2: Build the team of evaluators (variable)

Nielsen’s research showed that a single evaluator only detects on average 35% of usability issues. With 3 evaluators, you reach about 60%. With 5 evaluators, you reach about 75%. Beyond 5, marginal returns drop sharply.

My recommendation: 3 to 5 evaluators.

Evaluators should have at least some background in ergonomics or UX. They can be UX designers, product designers, project managers trained in UX, or ergonomists. If you don’t have enough internal experts, you can quickly train product team members on Nielsen’s heuristics: 30 minutes of presentation is enough for them to understand the principles and be able to spot the most obvious violations.

Important point: each evaluator must work alone. If evaluators discuss with each other during the inspection, they risk influencing each other and converging on the same conclusions. This reduces the total number of issues detected.

Step 3: Prepare the evaluation grid (30 minutes)

Create a table (Google Sheets, Notion, Excel) with the following columns:

#ScreenHeuristic violatedIssue descriptionRecommendationSeverity (0-4)Evaluator
1Home pageH1 - Visibility of system statusThe user doesn’t know if their search is runningAdd a loading indicator (spinner)3A. Smith

The severity scale is:

  • 0 - Not a usability issue: aesthetic violation but no impact on use.
  • 1 - Cosmetic issue: fix only if time permits.
  • 2 - Minor issue: slight friction, low priority.
  • 3 - Major issue: significant friction, high priority. The user can be blocked or heavily slowed down.
  • 4 - Catastrophic issue: the user is blocked or quits the product. Must be fixed before going to production.

Share this grid with each evaluator with clear instructions: walk through each screen in the defined order, log each violation as a separate row, even if the same heuristic is violated multiple times.

Step 4: Run the individual evaluation (1 to 2 hours per evaluator)

Each evaluator goes through the interface alone, at their own pace. I recommend doing two passes:

  1. First pass (30 to 45 minutes): walk through all the screens to get familiar with the product, understand the navigation logic, and the functional scope.
  2. Second pass (30 to 60 minutes): revisit each screen in detail and systematically log violations using the heuristics grid.

For each violation identified, the evaluator should:

  • Identify the affected screen.
  • Cite the violated heuristic(s).
  • Describe the issue factually.
  • Propose a concrete recommendation if possible.
  • Assign a provisional severity score.

Practical tip: take screenshots and annotate them directly. A tool like Markup Hero, Shottr, or even your system’s native annotation tools is enough. It makes the report much more compelling for non-UX stakeholders.

Step 5: Consolidate the results (1 to 2 hours)

Once all evaluators have completed their individual inspection, gather the results into a single consolidated table.

  1. Deduplicate identical issues. Multiple evaluators will likely have spotted the same issues. Merge them into a single row and note how many evaluators flagged that issue. The more evaluators detect an issue, the more severe and visible it likely is.

  2. Reconcile severity scores. For each issue, average the scores assigned by the different evaluators. In case of major disagreement (one evaluator scores 1 and another scores 4), run a short discussion to reach a verdict.

  3. Sort by descending severity. Severity 4 and 3 issues should appear first.

  4. Categorize by heuristic. This helps identify “families” of recurring issues. If you have 12 violations of heuristic 1 (Visibility of system status), it points to a systemic feedback issue in the product.

Step 6: Write the report and prioritize (2 to 3 hours)

By the end of the evaluation, you should have a report containing:

  • An executive summary: total number of issues identified, distribution by severity, the top 3 to 5 most critical issues.
  • The consolidated grid with all the issues, sorted by severity.
  • Annotated screenshots for major and catastrophic issues.
  • Prioritized recommendations: what to fix first?

For prioritization, I recommend a simple matrix: impact (issue severity) vs. estimated fix effort. Issues with high severity and low fix effort are the “quick wins” to tackle first.


Common mistakes to avoid

1. Running the evaluation alone. This is the most common mistake. A single evaluator only detects a third of the issues. If you’re alone, your audit will have insufficient coverage. Involve at least two other people.

2. Confusing heuristic evaluation with user testing. Heuristic evaluation rests on expert judgment, not on observing real users. The two methods have different strengths. Heuristic evaluation is fast and cheap. User testing reveals issues even experts can’t see.

3. Evaluating without a grid. Walking through an interface “by feel” and noting what seems off isn’t a heuristic evaluation. Without a reference grid, you risk missing entire categories of issues. Pick Nielsen or Bastien-Scapin and follow the grid.

4. Neglecting severity. Listing 150 issues without prioritizing them is counterproductive. The development team needs to know where to start. Always use the severity scale.

5. Not accounting for usage context. A violation that seems critical in the abstract may be minor if the affected feature is rarely used. Cross-check your results with usage data (analytics, user feedback) when available.

6. Forgetting mobile journeys. If your product is used on mobile, evaluate the mobile interface separately. Mobile ergonomic issues (tap target size, thumb navigation, load times) are often different from desktop ones.


When to use heuristic evaluation vs. user testing?

CriterionHeuristic evaluationUser testing
Who evaluates?UX expertsReal users
Duration1 to 2 days1 to 3 weeks (with recruiting)
CostLow (expert time)Moderate to high (recruiting, incentives)
What you detectErgonomic principle violationsReal behaviors, misunderstandings
What you missIssues specific to the user contextPrinciple violations not covered by the tested scenario
When to useUpfront, when you need a quick diagnosisWhen you want to validate hypotheses with users

My recommendation: start with a heuristic evaluation to identify and fix the most obvious issues, then follow up with user tests to explore real behaviors. Heuristic evaluation lets you “clean up” the interface before testing, so sessions can focus on the real usage questions.


Going further

  • User testing: the complementary method to observe real behaviors
  • Exploratory interviews: to understand the usage context before auditing the interface
  • The experience map: to map the global user journey
  • Nielsen, J. (1994). Usability Engineering. Academic Press. The reference book on the 10 heuristics.
  • Bastien, J.M.C., Scapin, D.L. (1993). Ergonomic Criteria for the Evaluation of Human-Computer Interfaces. INRIA Report no. 156.
  • Nielsen’s 10 heuristics on nngroup.com

Want to go further?

I offer individual coaching to dig deeper and apply these topics to your context.

Book a session