Startup Synthesia’s AI-generated avatars are getting an update to make them even more realistic: They will soon have bodies that can move, and hands that gesticulate.
The new full-body avatars will be able to do things like sing and brandish a microphone while dancing, or move from behind a desk and walk across a room. They will be able to express more complex emotions than previously possible, like excitement, fear, or nervousness, says Victor Riparbelli, the company’s CEO. Synthesia intends to launch the new avatars toward the end of the year.
“It’s very impressive. No one else is able to do that,” says Jack Saunders, a researcher at the University of Bath, who was not involved in Synthesia’s work.
The full-body avatars he previewed are very good, he says, despite small errors such as hands “slicing” into each other at times. But “chances are you’re not really going to be looking that close to notice it,” Saunders says.
Synthesia launched its first version of hyperrealistic AI avatars, also known as deepfakes, in April. These avatars use large language models to match expressions and tone of voice to the sentiment of spoken text. Diffusion models, as used in image- and video-generating AI systems, create the avatar’s look. However, the avatars in this generation appear only from the torso up, which can detract from the otherwise impressive realism.
To create the full-body avatars, Synthesia is building an even bigger AI model. Users will have to go into a studio to record their body movements.
But before these full-body avatars become available, the company is launching another version of AI avatars that have hands and can be filmed from multiple angles. Their predecessors were only available in portrait mode and were just visible from the front.
Other startups, such as Hour One, have launched similar avatars with hands. Synthesia’s version, which I got to test in a research preview and will be launched in late July, has slightly more realistic hand movements and lip-synching.
Crucially, the coming update also makes it far easier to create your own personalized avatar. The company’s previous custom AI avatars required users to go into a studio to record their face and voice over the span of a couple of hours, as I reported in April.
This time, I recorded the material needed in just 10 minutes in the Synthesia office, using a digital camera, a lapel mike, and a laptop. But an even more basic setup, such as a laptop camera, would do. And while previously I had to record my facial movements and voice separately, this time the data was collected at the same time. The process also includes reading a script expressing consent to being recorded in this way, and reading out a randomly generated security passcode.
These changes allow more scale and give the AI models powering the avatars more capabilities with less data, says Riparbelli. The results are also much faster. While I had to wait a few weeks to get my studio-made avatar, the new homemade ones were available the next day.
Below, you can see my test of the new homemade avatars with hands.
The homemade avatars aren’t as expressive as the studio-made ones yet, and users can’t change the backgrounds of their avatars, says Alexandru Voica, Synthesia’s head of corporate affairs and policy. The hands are animated using an advanced form of looping technology, which repeats the same hand movements in a way that is responsive to the content of the script.
Hands are tricky for AI to do well—even more so than faces, Vittorio Ferrari, Synthesia’s director of science, told me in in March. That’s because our mouths move in relatively small and predictable ways while we talk, making it possible to sync the deepfake version up with speech, but we move our hands in lots of different ways. On the flip side, while faces require close attention to detail because we tend to focus on them, hands can be less precise, Ferrari says.
Even if they’re imperfect, AI-generated hands and bodies add a lot to the illusion of realism, which poses serious risks at a time when deepfakes and online misinformation are proliferating. Synthesia has strict content moderation policies, carefully vetting both its customers and the sort of content they’re able to generate. For example, only accredited news outlets can generate content on news.
These new advancements in avatar technologies are another hammer blow to our ability to believe what we see online, says Saunders.
“People need to know you can’t trust anything,” he says. “Synthesia is doing this now, and another year down the line it will be better and other companies will be doing it.”