Alexa goes down the conversational rabbit hole

Onstage at re:Mars this week, Amazon showcased a developing Alexa feature meant to mimic the flow of natural language. Conversation between two humans rarely follows some predefined structure. It goes to strange and unexpected places. One topic segues into another, as participants inject their lived experience.

In a demo, a conversation about trees turns to one about hiking and parks. In the context of the company’s AI, senior vice president and head scientist for Alexa, Rohit Prasad, refers to the phenomenon as “conversation exploration.” It’s not a proper name for a proper feature, exactly. There isn’t a switch that gets flipped to suddenly enable conversations overnight. Rather, it’s part of an evolving notion of how Alexa can interact with users in a more human — or perhaps, more humane — manner.

Smart assistants like Alexa have traditionally provided a much more simplistic question-and-response model. Ask Alexa the weather, and Alexa tells you the weather in a predetermined area. Ask her the A’s score (or, honestly, probably don’t), and Alexa tells you the A’s score. It’s a straightforward interaction, not dissimilar to typing a question into a search engine. But, again, real-world conversations rarely play out this way.

“There’s a whole range of questions Alexa gets, which are very much information bearing. When those questions happen, you can imagine they’re not point questions,” Prasad told TechCrunch in a conversation at the event. “They’re really about something the customer wants to learn more about. What’s on top of our minds right now is what’s happening with inflation. We get a ton of requests to Alexa like that, and it gives you that kind of exploration experience.”

Such conversational features, however, are the manner of things a home assistant like Alexa ramps up to. Eight years after being launched by Amazon, the assistant is still learning — collecting data and determining the best ways to interact with consumers. Even when something gets to the point where Amazon is ready to show it off on a keynote stage, tweaks are still required.

“Alexa needs to be an expert on many topics,” explained Prasad. “That’s the big paradigm change, and that kind of expertise takes a while to attain. This will be a journey, and with our customers’ interactions, it won’t be like from day one Alexa will know everything. But these questions can evolve into more explorations where you end up doing something you didn’t think you were.”

Seeing the word “Empathy” in big bold letters on the stage behind Prasad was a head-turner — though not, perhaps, as much as what came next.

There are some straightforward scenarios where the concept of empathy could or should factor in during a conversation with humans and smart assistants alike. Take, for example, the ability to read social cues. It’s a skill we pick up through experience — the ability to read the sometimes-subtle language of faces and bodies. Emotional intelligence for Alexa is a notion Prasad has been discussing for years. That starts with changing the assistant’s tone to respond in a manner conveying happiness or disappointment.

The flip side is determining the emotion of a human speaker, a concept the company has been working to perfect for several years. It’s work that’s manifested itself in various ways, including the 2020 debut of the company’s controversial wearable Halo, which offers a feature called Tone that purported to “analyze energy and positivity in a customer’s voice so they can understand how they sound to others and improve their communication and relationships.”

“I think both empathy and affect are well-known ways of interacting, in terms of building relationships,” Prasad said. “Alexa can’t be tone-deaf to your emotional state. If you walked in and you’re not in a happy mood, it’s hard to say what you should do. Someone who knows you well will react in a different way. It’s a very high bar for the AI, but it’s something you can’t ignore.”

The executive notes that Alexa has already become a kind of companion for some users — particularly among the older demographic. A more conversational approach would likely only enhance that phenomenon. In demos of Astro this week, the company frequently referred to the home robot as filling an almost pet-like function in the home. Such notions have their limitations, however.

“It shouldn’t hide the fact that it’s an AI,” Prasad added. “When it comes to the point [where] it’s indistinguishable — which we’re very far from — it should still be very transparent.”

A subsequent video demonstrated an impressive new voice synthesis technology that utilizes as little as a minute of audio to create a convincing approximation of a person speaking. In it, a grandmother’s voice is reading her grandson “The Wizard of Oz.” The idea of memorializing loved ones through machine learning isn’t entirely new. Companies like MyHeritage are using tech to animate images of deceased relatives, for example. But these scenarios invariably — and understandably — raise some hackles.

Prasad was quick to point out that the demo was more of a proof of concept, highlighting the underlying voice technologies.

“It was more about the technology,” he explained. “We’re a very customer-obsessed science company. We want our science to mean something to customers. Unlike a lot of things where generation and synthesize has been used without the right gates, this feels like one customers would love. We have to give them the right set of controls, including whose voice it is.”

With that in mind, there’s no timeline for such a feature — if, indeed, such a feature will ever actually exist on Alexa. However, the exec notes that the technology that would power it is very much up and running in the Amazon Labs. Though, again, if it does arrive, it would require some of the aforementioned transparency.

“Unlike deepfakes, if you’re transparent about what it’s being used for, there is a clear decision maker and the customer is in control of their data and what they want it to be used for, I think this is the right set of steps,” Prasad explained. “This was not about ‘dead grandma.’ The grandma is alive in this one, just to be very clear about it.”

Asked what Alexa might look like 10 to 15 years in the future, Prasad explains that it’s all about choice — though less about imbuing Alexa with individual and unique personalities than offering a flexible computing platform for users.

“It should be able to accomplish anything you want,” he said. “It’s not just through voice; it’s intelligence in the right moment, which is where ambient intelligence comes in. It should proactively help you in some cases and anticipate your need. This is where we take the conversational exploration further out. Anything you look for — imagine how much time you spend on booking a vacation [when you don’t] have a travel agent. Imagine how much time you spend buying that camera or TV you want. Anything that requires you to spend time searching should become much faster.”