The rulebook for designing AI experiences

Three of the world’s largest tech companies have published guidelines for responsible Human-AI Interaction. Here’s what they got right, and where the gaps are.

Ask ten people in the industry what responsible AI design means and you will get ten different answers. Ethics frameworks, trustworthy AI principles, responsible innovation checklists: the vocabulary keeps growing. But underneath all of that sits a more practical question. When someone is actually sitting in front of an AI-powered product, what makes the experience good? And who, if anyone, has written that down in a way that’s actually useful?

As it turns out, a few organisations have. Over the past several years, Microsoft, Google, and IBM have each invested in developing publicly available frameworks for Human-AI Interaction (HAI) design. Not just high-level ethics guidance, but concrete, pattern-based resources aimed at teams building real products. They don’t all agree on approach, and none of them covers everything. But together, they form something worth knowing about.

The question of how to design well with AI has been circling the industry for years, mostly at the level of broad principles and ethical commitments. What makes these three resources different is that they attempt to get specific: to translate intent into guidance that a designer can actually use in a sprint, a critique, or a product review. That gap between principle and practice is wider than it looks, and these resources are among the more serious attempts to close it.

The Microsoft HAX Toolkit homepage, showing four components: Guidelines for Human-AI Interaction, HAX Design Library, HAX Workbook, and HAX Playbook, each illustrated with a colourful graphic.
The HAX Toolkit breaks down into four practical components, designed to be used early in the product development process.

Microsoft: 18 guidelines and a library to go with them

Microsoft’s contribution is the HAX Toolkit, built around 18 guidelines for Human-AI Interaction. First published in a 2019 paper by Saleema Amershi and colleagues at Microsoft Research, the work has since become one of the most-referenced in the field.

The HAX Design Library is where those principles come to life. Each one is paired with design patterns and real-world examples. The library is also filterable by product category (chatbot, voice assistant, health and wellness, and so on), AI type, and design goal. On that last dimension, options span transparency, personalisation, reliability, fairness, and appropriate reliance, so you can quickly find what’s most relevant to your specific context. It’s a genuinely useful tool rather than a document you read once and file away.

The HAX Design Library interface showing a filter panel on the left and three guideline cards: Guideline 1 “Make clear what the system can do”, Guideline 2 “Make clear how well the system can do what it can do”, and Guideline 3 “Time services based on context.”
The HAX Design Library lets teams browse all 18 guidelines, filter by category, and explore design patterns alongside real-world examples for each one.

The guidelines cover the full lifecycle of an interaction: setting expectations upfront, handling errors gracefully, and managing the longer-term relationship between user and system. A handful deals with the longer game of an AI relationship, covering how a system should learn from behaviour, adapt cautiously, and notify users when its capabilities change.

For designers and leads working on AI features, the framing around appropriate reliance is one of the more important ideas in the toolkit. The goal isn’t simply to build trust. It’s to build the right amount of trust, calibrated to what the system can actually do. Overconfident AI experiences are increasingly a design problem, not just an engineering one.

A medical triage tool that presents its suggestions with the same confidence as a music recommendation engine is not just badly designed. It is potentially dangerous. The point of appropriate reliance isn’t to make people distrust AI, but to help them develop an accurate sense of when to defer to it and when to push back. That calibration looks different depending on the stakes, the domain, and who is sitting in front of the screen.

Google: starting with the right questions

Google’s People + AI Guidebook, produced by their PAIR (People + AI Research) team, takes a different approach. Rather than numbered guidelines, it’s structured around six thematic chapters:

  • User Needs + Defining Success
  • Mental Models
  • Feedback + Control
  • Explainability + Trust
  • Data Collection + Evaluation
  • Errors + Graceful Failure
The People + AI Guidebook homepage with a dark green banner reading “The People + AI Guidebook is a collection of practical guidance for designing human-centered AI products”, with navigation options for Chapters, Principles and Patterns, Workshops, and Case Studies.
Google’s People + AI Guidebook covers the full arc of AI product development, from defining user needs through to handling errors and building trust.

Each chapter comes with worksheets intended to turn the guidance into something actionable in a team setting.

The second edition, released in 2021, added a set of standalone design patterns organised around the questions teams tend to ask in practice: How should I use AI in my product? How do I onboard users to new AI features? How do I explain my AI system to users?

That shift from chapters to questions reflects a more honest understanding of how professionals actually work. It makes the resource easier to dip into at a specific moment rather than reading cover to cover.

The Principles and Patterns section of the PAIR Guidebook, showing 23 patterns filtered by question, including “How do I explain my AI system to users?”, “How do I onboard users to new AI features?”, and “How do I help users build and calibrate trust in my product?” The first two visible patterns are “Determine if AI adds value” and “Set the right expectations.”
The second edition of the PAIR Guidebook organises its 23 design patterns around the questions product teams are most likely to ask, making it easier to find relevant guidance at any stage of development.

One pattern stands out for anyone thinking about chatbots or conversational interfaces: explain for understanding, not completeness. The principle is that when surfacing AI reasoning to users, you should focus on what they need in order to move forward, not on exposing everything happening under the bonnet. People don’t want to be overwhelmed by technical rationale mid-task; they want enough to feel informed and in control. It sounds obvious when you say it. It’s harder to execute well.

Traffic to the Guidebook jumped 560% between February and August 2023, as generative AI products flooded the market and teams suddenly needed something more grounded than philosophy. The PAIR team has since begun updating the resource specifically for generative AI, a project that still involves more questions than answers.

IBM: ethics as infrastructure, and a challenge to the field

IBM approaches this territory from two directions. Their design practice is grounded in a set of AI design ethics principles, intended as a shared foundation across all IBM products. These address areas including accountability, explainability, value alignment, fairness, and user data rights.

The AI Essentials Framework, which sits alongside it, is a team-facing tool built around five pillars: intent, data, understanding, reasoning, and knowledge.

A diagram from IBM Design for AI showing the five pillars of the AI Essentials Framework as a sequential flow: Intent, Data and Policy, Understanding, Reasoning, and Knowledge, each with a brief description beneath.
IBM’s AI Essentials Framework maps out five interdependent pillars, from clarifying business and user intent through to what the system ultimately knows and learns.

Their more provocative contribution came at CHI 2024, where a team from IBM Research presented six design principles for generative AI applications. The paper made a specific and well-argued point. Most existing HAI frameworks, including Microsoft’s and Google’s, were originally designed for AI that makes decisions. Think classifying, ranking, predicting. Generative AI works differently. Instead of reaching a conclusion, it produces something: a draft, an image, a block of code. The design challenge moves from helping users evaluate an output to helping them shape one.

The six principles break into two groups. Three revisit concerns that will be familiar to most designers, but reframe them specifically for generative contexts.

The first asks teams to design responsibly, recognising that generative results carry new risks around misinformation and bias. The second focuses on designing for mental models, helping users build an accurate understanding of what the system can and cannot do. The third returns to the idea of appropriate trust and reliance, which takes on added complexity when a system can produce confident-sounding responses that simply aren’t true.

The other three address characteristics that are genuinely new to these systems:

  • Designing for Generative Variability acknowledges that the same prompt can produce meaningfully different outputs each time, which runs counter to the consistency and predictability that traditional UX design has long prioritised.
  • Designing for Co-Creation focuses on giving users the controls they need to actively shape and steer the generative process, rather than simply receiving whatever the model decides to produce.
  • Designing for Imperfection asks designers to be transparent about results that may be plausible but inaccurate, incomplete, or biased, and to build in mechanisms for people to identify and address those flaws.

Taken together, the principles read less like a checklist and more like a shift in how designers are asked to think about the human side of these systems. They are still relatively new and not yet as field-tested as the Microsoft or Google resources, but as a provocation they are well-argued and timely.

What they share, and where the gaps are

All three frameworks converge on a cluster of concerns:

  • Transparency about system capabilities and limitations
  • Support for user control and correction
  • Feedback loops that allow both user and system to improve
  • Some version of graceful failure when things don’t go to plan

These are, in a sense, the non-negotiables of HAI design, the things that keep showing up regardless of whose framework you’re reading.

The alignment itself is significant. They were developed independently, by different organisations, using different methodologies. The fact that they arrive at such similar foundations suggests those foundations are fairly robust. It also means that if your team is using any one of these resources as a starting point, you are unlikely to be missing something the others would have caught, at least on the core concerns. The divergences tend to show up at the edges.

Where they’re thinner is on the harder problems. Research comparing both the Microsoft guidelines and the Google PAIR patterns against the EU’s Ethics Guidelines for Trustworthy AI found some notable gaps. Diversity, non-discrimination, fairness, and environmental and social well-being are all relatively underserved. It’s not that these issues are absent, but they don’t carry the same weight as usability and trust. Whether that reflects the practical priorities of product teams or a genuine blind spot in the discipline is a question worth asking.

Illustration of a checklist with a magnifying glass. The top items — Product quality, Market readiness, Financial viability, Customer satisfaction, Operational efficiency, Legal compliance, and Brand consistency — are checked off in teal. At the bottom, three items in faded coral remain unchecked: Diversity + fairness, Environmental impact, and Social wellbeing. The magnifying glass hovers over the unchecked items, highlighting the gap.
Robust on usability. Thinner on the rest.

Part of the answer is probably structural. Frameworks developed by large technology companies tend to reflect the problems those companies are trying to solve, which often means usability, trust, and error recovery rather than broader social consequences. There is also a practical challenge: diversity, fairness, and environmental impact are harder to operationalise as design patterns. It is easier to write a guideline that says “make clear what the system can do” than one that says “ensure your system does not disproportionately fail certain groups of users.” That difficulty is not a reason to leave those concerns out, but it does help explain the gap.

There’s also the generative AI gap that IBM pointed to. Frameworks built for an earlier generation of AI don’t always stretch naturally to cover the particular demands of systems that write, draw, and generate. Nobody has quite worked out what that should look like yet, and until that changes, designers are largely filling that space with intuition and iteration.

So, should you be using these?

The honest answer is that these frameworks are most useful when treated as thinking tools rather than compliance documents. None of them will tell you exactly what to do in a specific design decision. What they will do is give you a vocabulary for the conversation, a set of dimensions to pressure-test against, and the grounding to push back when a product decision is optimising for capability at the expense of user comprehension.

In practice, that might mean running a heuristic review of a new AI feature against Microsoft’s 18 guidelines before it ships, or using the PAIR Guidebook’s worksheets to structure a design critique. For organisations starting from scratch with an AI product, IBM’s AI Essentials Framework gives a useful way to align on intent and data strategy before any design work begins. None of this needs to be ceremonial. Used lightly and honestly, they function more like a lens than a process: a way of asking questions you might not have thought to ask.

Illustration of a designer sitting at a desk, hand on chin in thought, looking at a monitor displaying an AI interface with charts and network diagrams. Three translucent framework panels float around the screen — one coral with icons of scales, a heart, and a globe; one teal with geometric shapes and connections; one with people icons and a lightbulb. The frameworks overlay the work as thinking tools rather than blocking it.
A lens, not a checklist.

For design leaders, there’s also a reasonable case for using them to build shared language across the organisation, particularly as more organisations find themselves with AI features scattered across products, developed by different people, with varying degrees of intentionality about the experience they’re creating.

All three exist. They’re free, they’re relatively well-maintained, and the teams behind them have done serious work. That feels like a reasonable place to start, even if it’s also, clearly, not the end of the conversation.

Thanks for reading! 📖

If you enjoyed this, follow me on Medium for more on design, psychology and technology.

References & Credits

Amershi, S., Weld, D., Vorvoreanu, M., Fourney, A., Nushi, B., Collisson, P., Suh, J., Iqbal, S., Bennett, P. N., Inkpen, K., Teevan, J., Kikin-Gil, R., and Horvitz, E. (2019). Guidelines for Human-AI Interaction. Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM. https://doi.org/10.1145/3290605.3300233

Google PAIR (2019, updated 2021). People + AI Guidebook. People + AI Research, Google. https://pair.withgoogle.com/guidebook

IBM Design (2022). AI Design Ethics. IBM. https://www.ibm.com/design/ai/ethics

IBM Design (2022). AI Essentials Framework. IBM. https://www.ibm.com/design/ai/team-essentials

Li, T., Vorvoreanu, M., DeBellis, D., and Amershi, S. (2023). Assessing Human-AI Interaction Early through Factorial Surveys: A Study on the Guidelines for Human-AI Interaction. ACM Transactions on Computer-Human Interaction, 30(5), 1–45. https://doi.org/10.1145/3511605

Microsoft (2023). HAX Toolkit: HAX Design Library. Microsoft. https://www.microsoft.com/en-us/haxtoolkit/library

Weisz, J. D., He, J., Muller, M., Hoefer, G., Miles, R., and Geyer, W. (2024). Design Principles for Generative AI Applications. Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI ‘24). ACM. https://doi.org/10.1145/3613904.3642466


The rulebook for designing AI experiences was originally published in UX Collective on Medium, where people are continuing the conversation by highlighting and responding to this story.

 

This post first appeared on Read More