[AI for Business 2] - Shaping the human-AI partnership

How to explain AI, build trust, and augment the human user

May 17, 2024

On May 14th, I had the pleasure of presenting at ProductTank in Paris. I talked about the ways in which AI is shaking up how we work with software. We had a lively discussion with the audience, crystallizing a key theme: when implementing AI initiatives, we need to support AI users to become more proactive and knowledgeable in their interaction with AI systems, ultimately contributing to their designs.

Today’s episode will focus on some of the key elements on this path, covering the explanation of AI systems, calibrated trust, and the co-creation of AI-driven workflows with users. It will be particularly useful for practitioners who are:

Establishing the use of AI tools in their company
Implementing custom AI initiatives in their company
Developing and commercializing AI products

Shaping and correcting mental models

Figure 1: From interfaces that fully define our mental models to free-style text interaction

The mental model of an AI interface is different from traditional software. Think of a traditional graphical interface such as Jira, SAP, or Hubspot, which directly defines how the product works. It consists of carefully arranged visual elements and controls that guide the user through possible workflows. User interactions are limited and deterministic.

Now, think of a minimalistic interface of an AI product like ChatGPT or, even more extreme, ChatGPT-4o, which was announced this week. It interacts with us using language, opening up an infinite universe of possible interactions. The outputs of these products are highly variable and non-deterministic—slight changes in the input can completely change the AI response.

In addition, whenever we use language, we tend to intuitively assume a human on the other end, with all the amazing capabilities that are unique to our kind. However, AI models are much weaker than humans when it comes to complex cognitive tasks like reasoning, planning, and putting things into context. Many AI interfaces hide these limitations, leaving it to the user to explore them. In reality, not all users have the time, patience, and skill for an extended discovery trip, so AI systems need to provide better guidance to users. In the following, we will consider three components - explainability, trust, and co-creation - which can help us shape a human-AI partnership where users leverage the value of AI while understanding and neutralizing its shortcomings and gaps.

Explainability: cracking the black box of AI

Yesterday, a friend shared an automated chess program with me. While we definitely prefer to play against each other, chess bots come in handy when you are on a busy schedule and struggle to synch up. As we were playing around with the bots, we noticed some irregularities in their behavior - they would play amazingly well for a while and suddenly make rather stupid beginner mistakes. We started speculating about this behavior - were the bots trained to simulate (imperfect) human games? To win by all means? Or was there a didactic goal of helping human players improve their game? As we couldn’t find an explanation for the bot’s behavior, which would also help us “excuse” the irregularities, we soon abandoned the product and fixed a time to play between us.

Opening the black box of an AI system and explaining how it works is a challenging exercise - especially if you spent some of your lifetime in the engineering corner and know the complexities of the enterprise. Fortunately, a full explanation is often not just impossible, but also unnecessary. In many cases, a partial explanation is enough to shape an initial “draft“ of a mental model. From there, users can start exploring the system themselves and fill in the gaps based on their experience.

A partial explanation can contain the following components:

What are the main capabilities and limitations of the system? For a chess bot, is it able to maintain a certain level of skill during the full game, or are there specific situations in which it starts to slack?
What are the data sources? Just as we often like to know the ingredients that went into a dish we are enjoying, knowing the data that was used for training “grounds” the model in the real world. For example, a chess bot might have been trained on synthetic data, data from human games at different levels, or even from no data at all (as is the case for AlphaZero, trained with reinforcement learning).
How does the model work? This is a tricky one - many users will ask you the question, but not many will understand the answer. Explaining how a model with millions or billions of parameters works is difficult. Try to use intuitive explanations and analogies with human behavior. For example, for a chess bot, you could specify that it was designed to adapt to the level of the player, and gradually push him to improve by creating more challenging constellations on the board.
How to act on the AI outputs, and how to push through despite its errors? We often find that this is the most valuable part of the explanation. For example, when a chess bot slacks off, it might be a good idea to play it moderate until the bot gets back to its full strength. After all, winning just because your opponent made an obvious mistake is not exactly the pinnacle of sophistication in chess.

If you can cover these components in your explanation, you can achieve a decent degree of transparency. A (big) part of the AI black box will still remain black, but many users will be more willing to accept this once they start using the system and gradually build trust in its value.

Calibrating trust

For banks and other financial institutions, Anti-Money Laundering (AML) is a constant pursuit with a moving target. Criminals are always a step ahead in their designs of new fraud schemes. AI platforms such as LexisNexis and ComplyAdvantage allow banks to scale up the screening of financial transactions, also spotting new patterns in the data that would eschew the human brain. As a basic feature, these systems process a large number of transactions and flag those that appear shady, qualifying them for so-called Suspicious Activity Records (SARs).

Now, let’s face a hard fact about life with AI - any AI system that is worth the name will make errors. An AI for AML will be flagging some innocent transactions, and truly harmful transactions will sometimes go unnoticed. In addition to “normal” accuracy errors, we need to count in the dynamics of the space. Fraud schemes are constantly adapting to new AML capabilities, and an emerging fraud scheme has a big chance of surviving for a certain period of time without being caught by the system. How can users leverage the value of an AML system while being cautious and investigative about the errors it will make?

Figure 2: On the trust continuum, aim for calibrated trust

Users come to AI with different levels of trust. On the one extreme, there are those who simply have no trust - that can happen because of bad experiences with AI products. It can also be due to fears or reservations, such as the idea that AI will “automate away” the job of the user or simply take over the world. These users will refuse the support offered by AI, and fail to leverage its value. On the other extreme, we have overtrust - as the AI system is switched on, the critical section of the user’s brain is switched off. The user goes on autopilot and blindly accepts whatever the AI tells her. As you can imagine, overtrust can quickly turn into “no trust” once AI mistakes lead to really harmful decisions or actions.

In the golden middle, we find calibrated trust - the user is using the AI responsibly, staying alert to potential mistakes. When she sees an output that doesn’t appear just right, she will follow up with a deeper manual investigation and potentially reject the output. While this will not allow her to catch all the errors, it can easily bring down the error level to that of human workers while still saving significant time and cost.

How can we help the user build calibrated trust? There are three major components to consider:

A sound mental model: An understanding of how an AI system works and how certain predictions were made by the AI is the basis for building trust.
Utility and performance: Even the neatest explanation and mental model of a system are not helpful if the system fails to deliver value. The AI system needs to have a decent accuracy that is significantly higher than a “random guess”, and should deliver on it reliably over time.
Impact of the AI’s predictions: Certain domains and decisions - for example, those related to health, finance, and safety - can have a high impact on businesses and people. In these cases, you have to work harder to build trust. Besides, different types of errors inside the same system can have dramatically different consequences. In AML, a “false alert” about an innocent transaction means that a human needs to put in some work to verify it. On the other hand, failing to detect a malicious transaction can result in many millions of dollars in fines for the financial institution.

With calibrated trust, you lay the foundation for co-creation with the user. In this mode, the user is no longer a “consumer” of the AI system's capabilities but gets a say in its design.

Empowering users to co-create

For decades, AI was the safe realm of research labs and universities. ChatGPT has changed the landscape by making AI accessible to the broader public. While big companies such as OpenAI and Google are quick to push out more and more powerful models, their user interfaces are often not integrated with human workflows. To enforce this integration, we need to work with users to figure out the optimal degree of automation as well as the touchpoints where they need support from the AI.

Let’s consider the different levels of automation:

Figure 3: Possible levels of automation in a workflow

We start with a completely human process where no AI is involved. As of today, this is the reality of most jobs and processes. Then comes “assisted” AI - as we go about our work, AI pops up here and there to support us with our tasks. For example, spelling and grammar checkers such as Grammarly support us in writing correct English. The next level is “augmented” AI - the AI is perceived as an intelligent partner that thinks along and helps us accomplish a task. This is the reality of modern AI systems - think of ChatGPT as you try to iterate yourself to the perfect sales e-mail, your next blog article, or the boring section in a legal contract. The model is making its suggestions, you dislike some of their aspects, and provide further instructions for refinement. Finally comes full automation. Here, the interface is a simple “big red button” - once you press it, the AI goes off and works on solving the task on its own. Once ready, it gets back with an output. As a user, you don’t have control over the process - there is no way to influence the steps that lead to the output, and once it is formed, it is difficult to correct post hoc.

At first sight, full automation seems to be the ideal to aim for. Just imagine sending your AI on a mission that was originally yours, and going for some cocktails by the pool until it brings back the result. But in reality, full automation is not only unrealistic for most daily tasks - it is also not desired. On the one hand, humans are privileged with certain skills that an AI will never attain, such as social skills, empathy, and an intuitive understanding of complex contexts. As AI becomes a commodity, these skills become all the more precious in the business context. On the other hand, it is in our psychology to want to perform certain tasks and processes ourselves. If you are writing a book, you hardly want to get it print-ready from the AI - rather, you are up for a more involved creative process. If you are creating a new corporate strategy for the year, you hardly will appreciate delegating the task to an AI, no matter how powerful it is. The personal responsibility and the stakes of the situation are just too high.

To determine the ideal degree of automation, you need to engage in close dialogue with your users, mapping feasibility with desirability. You need to identify touchpoints where they struggle or get bogged down in the task's routine and try to smooth them out with AI. Ultimately, this will increase productivity while allowing users to focus on the application and the development of their uniquely human skills.

That’s it for today. As always, please share this post with interested colleagues. Also, get in touch if you have feedback or questions, want to share your own insights on the discussed topics, or have another topic that you are curious to learn about in future episodes!

Best wishes from Cyprus

Janna

And some updates of the week:

Today, chapters 4-6 of my upcoming book The Art of AI Product Development are going into production. Check it out for a deep dive into value creation with AI!
At the Swiss Data Science Conference on May 30th, together with ETH Zurich and HSLU, we will be presenting approaches on mining Cleantech publications and patents for new innovations. Join or follow the updates here to learn how AI can support companies in innovation and sustainability.
Check out my article on the relationship between Large Language Models and user experience to learn how LLMs “hide” most of the functionality of an AI product, and how product managers and designers can align LLM capabilities with user experience.

AI for Business