Reckoning with generative AI’s uncanny valley

Generative AI has the power to surprise in a way that few other technologies can. Sometimes that’s a very good thing; other times, not so good. In theory, as generative AI improves, this issue should become less important. However, in reality, as generative AI becomes more “human” it can begin to turn sinister and unsettling, plunging us into what robotics has long described as the “uncanny valley.”

It might be tempting to overlook this experience as something that can be corrected by bigger data sets or better training. However, insofar as it speaks to a disturbance in our mental model of the technology (e.g., I don’t like what it did there) it’s something that needs to be acknowledged and addressed.

Mental models and antipatterns

Mental models are an important concept in UX and product design, but they need to be more readily embraced by the AI community. At one level, mental models often don’t appear because they are routine patterns of our assumptions about an AI system. This is something we discussed at length in the process of putting together the latest volume of the Thoughtworks Technology Radar, a biannual report based on our experiences working with clients all over the world.

For instance, we called out complacency with AI generated code and replacing pair programming with generative AI as two practices we believe practitioners must avoid as the popularity of AI coding assistants continues to grow. Both emerge from poor mental models that fail to acknowledge how this technology actually works and its limitations. The consequences are that the more convincing and “human” these tools become, the harder it is for us to acknowledge how the technology actually works and the limitations of the “solutions” it provides us.

Of course, for those deploying generative AI into the world, the risks are similar, perhaps even more pronounced. While the intent behind such tools is usually to create something convincing and usable, if such tools mislead, trick, or even merely unsettle users, their value and worth evaporates. It’s no surprise that legislation, such as the EU AI Act, which requires of deep fake creators to label content as “AI generated,” is being passed to address these problems.

It’s worth pointing out that this isn’t just an issue for AI and robotics. Back in 2011, our colleague Martin Fowler wrote about how certain approaches to building cross platform mobile applications can create an uncanny valley, “where things work mostly like… native controls but there are just enough tiny differences to throw users off.”

Specifically, Fowler wrote something we think is instructive: “different platforms have different ways they expect you to use them that alter the entire experience design.” The point here, applied to generative AI, is that different contexts and different use cases all come with different sets of assumptions and mental models that change at what point users might drop into the uncanny valley. These subtle differences change one’s experience or perception of a large language model’s (LLM) output.

For example, for the drug researcher that wants vast amounts of synthetic data, accuracy at a micro level may be unimportant; for the lawyer trying to grasp legal documentation, accuracy matters a lot. In fact, dropping into the uncanny valley might just be the signal to step back and reassess your expectations.

Shifting our perspective

The uncanny valley of generative AI might be troubling, even something we want to minimize, but it should also remind us of generative AI’s limitations—it should encourage us to rethink our perspective.

There have been some interesting attempts to do that across the industry. One that stands out is Ethan Mollick, a professor at the University of Pennsylvania, who argues that AI shouldn’t be understood as good software but instead as “pretty good people.”

Therefore, our expectations about what generative AI can do and where it’s effective must remain provisional and should be flexible. To a certain extent, this might be one way of overcoming the uncanny valley—by reflecting on our assumptions and expectations, we remove the technology’s power to disturb or confound them.

However, simply calling for a mindset shift isn’t enough. There are various practices and tools that can help. One example is the technique, which we identified in the latest Technology Radar, of getting structured outputs from LLMs. This can be done by either instructing a model to respond in a particular format when prompting or through fine-tuning. Thanks to tools like Instructor, it is getting easier to do that and creates greater alignment between expectations and what the LLM will output. While there’s a chance something unexpected or not quite right might happen, this technique goes some way to addressing that.

There are other techniques too, including retrieval augmented generation as a way of better controlling the “context window.” There are frameworks and tools that can help evaluate and measure the success of such techniques, including Ragas and DeepEval, which are libraries that provide AI developers with metrics for faithfulness and relevance.

Measurement is important, as are relevant guidelines and policies for LLMs, such as LLM guardrails. It’s important to take steps to better understand what’s actually happening inside these models. Completely unpacking these black boxes might be impossible, but tools like Langfuse can help. Doing so may go a long way in reorienting the relationship with this technology, shifting mental models, and removing the possibility of falling into the uncanny valley.

An opportunity, not a flaw

These tools—part of a Cambrian explosion of generative AI tools—can help practitioners rethink generative AI and, hopefully, build better and more responsible products. However, for the wider world, this work will remain invisible. What’s important is exploring how we can evolve toolchains to better control and understand generative AI, even though existing mental models and conceptions of generative AI are a fundamental design problem, not a marginal issue we can choose to ignore.

Ken Mugrage is the principal technologist in the office of the CTO at Thoughtworks. Srinivasan Raguraman is a technical principal at Thoughtworks based in Singapore.

This content was produced by Thoughtworks. It was not written by MIT Technology Review’s editorial staff.