Audio/Video

New AI-Powered Service Turns Portraits Into Talking Heads

artificial intelligence facial recognition

A new service powered by artificial intelligence that can turn portraits into talking heads was announced Monday by D-ID.

Called Creative Reality Studio, the self-service application can turn the image of a face into a video, complete with speech.

The service is aimed at business content creators — learning and development units, human resource departments, marketers, advertisers, and sales teams — but anyone can try out the technology at the D-ID website.

Creative Reality Studio
Video by John P. Mello Jr.


The platform reduces the cost and hassle of creating corporate video content and offers an unlimited variety of presenters — versus limited avatars — including the users’ own photos or any image they have the rights to use, according to the company, which gained some notoriety when its technology was used in an app called Deep Nostalgia. The software was pitched as a way to animate old portraits.

The company added that the technology enables customers and users to choose the identity of a presenter, including their ethnicity, gender, age, and even their language, accent, and intonation. “This offers greater representation and diversity, leading to a stronger sense of inclusion and belonging, driving further engagement and interaction with the businesses who use it,” it said in a news release.

“The use cases include empowering business content creators to seamlessly integrate video in digital spaces and presentations with the exclusive PowerPoint plug-in, generating more engaging content using customized corporate video narrators,” D-ID Marketing Vice President Matthew Kershaw told TechNewsWorld.

Impressive Services

The quality of these services is fairly impressive and keeps getting better, maintained Daniel Castro, vice president of the Information Technology and Innovation Foundation, a research and public policy organization in Washington, D.C.

“This service isn’t at the level where it’s fully replacing a presenter, but there is no reason not to expect it to get there relatively soon,” he told TechNewsWorld.

D-ID explained that the use of video by businesses has increased dramatically, and more of them are integrating it into their training, communications, and marketing strategies.

Accelerating this trend, it continued, are the rapidly evolving worlds of avatars and the metaverse, both of which demand a more creative, immersive, and interactive content approach from digital creators. Production budgets, however, can be prohibitively expensive and require significant allocations of time and talent.

“The service is an evolution of the avatars and emojis people use today, but it can be used over a longer discussion or presentation,” observed Ross Rubin, the principal analyst at Reticle Research, a consumer technology advisory firm in New York City.

“The idea is to save time, especially if you were going to read off a script,” he told TechNewsWorld. “It can be more engaging for an audience than just audio or looking at slides.”

Democratizing AI

D-ID CEO and Co-founder Gil Perry noted in a news release that the company’s technology, which has been limited to the enterprise, has been used to generate 100 million videos.

“Now that we’re offering our self-service Creative Reality platform, the potential is huge,” he continued. “It enables both larger enterprises, smaller companies, and freelancers to produce personalized videos for a range of purposes at massive scale.”

Kershaw added that D-ID’s technology will further democratize creativity. “I say ‘further’ because, in fact, technology has already been democratizing the arts for decades,” he said.

“From the inception of synthesizers, samplers, and sequencers in music to Photoshop and Illustrator in photography and illustration, and Premier and desktop editing and motion graphics in film production, the ability to create high-quality productions outside of specialist high-end studios has been happening since the 1980s,” he said. “This is just the latest episode in that long-running series.”

“It’s definitely a step forward towards democratizing AI,” agreed Avivah Litan, a security and privacy analyst with Gartner. “It’s got a lot of great use cases in education, healthcare, and retail,” she told TechNewsWorld. “It’s just a better way to communicate with people. We’re becoming a much more visual society. No one has time to read anything.”

Deepfake Concerns

With growing concern over the use of “deepfakes” to spread misinformation and raise social engineering to new heights, there’s always the potential of abuse looming over new synthetic media solutions like D-ID’s.

“As with any technology, ours can be used for ill by bad actors, but our platform is aimed at legitimate businesses, who would have no interest in that kind of use,” Kershaw said.

“Furthermore,” he continued, “we are not deepfake. We don’t put someone else’s face on another person’s body, and we are not trying to make someone say something they didn’t say.”

“Within D-ID’s platform, we have put in multiple safeguards to make sure our technology isn’t used that way,” he added. “We do not replicate the voice of celebrities or without any person’s permission.”

The company also filters swear words and racist remarks and bars the platform from being used to create political videos.

“D-ID is putting guardrails on its platform, but we all know that guardrails are never perfect,” observed Litan.

“It’s a great tool for spreading misinformation because these social media sites aren’t prepared for deepfakes,” she said. “Even if the social media sites got good at identifying deepfakes, they’ll never get good enough. It’s like spam. Spam always gets through. This will get through, too, but the consequences will be worse.”

Need for Provenance

Detecting deepfakes is a losing proposition in the long run, Litan maintained. Even today, detection algorithms generally cannot identify more than 70% of deep fakes.

She added that determined adversaries will keep pace with deepfake detection by using generative adversarial networks so that detection rates will eventually drop to as low as 50%.

She predicts that in 2023, 20% of successful account takeover attacks will use deepfakes to socially engineer users to turn over sensitive data or move money into criminal accounts.

“Many safeguards need to be applied industry-wide, which is why we are also working with industry bodies and regulators to put legal safeguards in place that will make the industry, in general, more safe and reliable,” Kershaw said. “We think that, in particular, having an industry-wide system for invisibly watermarking content through the use of steganography would get rid of nearly all the potential issues.”

“You would be able to see a piece of media and by clicking on a button also see its provenance, where it came from and what it contained,” he noted. “Transparency is the solution.”

“There are many ways to deal with fakes, but the most important is knowing the provenance and authenticity of media,” Castro added.

John P. Mello Jr.

John P. Mello Jr. has been an ECT News Network reporter since 2003. His areas of focus include cybersecurity, IT issues, privacy, e-commerce, social media, artificial intelligence, big data and consumer electronics. He has written and edited for numerous publications, including the Boston Business Journal, the Boston Phoenix, Megapixel.Net and Government Security News. Email John.

Leave a Comment

Please sign in to post or reply to a comment. New users create a free account.

More by John P. Mello Jr.
More in Audio/Video

Technewsworld Channels