Skip to content
CapRadio

CapRadio

signal status listen live donate
listen live donate signal status
listen live donate signal status
  • News
    • topics
    • State Government
    • Environment
    • Health Care
    • Race and Equity
    • Business
    • Arts and Lifestyle
    • Food and Sustainability
    • PolitiFact California
  • Music
    • genres
    • Classical
    • Jazz
    • Eclectic
    • Daily Playlist
  • Programs + Podcasts
    • news
    • Morning Edition
    • All Things Considered
    • Marketplace
    • Insight With Vicki Gonzalez
    • music
    • Acid Jazz
    • At the Opera
    • Classical Music
    • Connections
    • Excellence in Jazz
    • Hey, Listen!
    • K-ZAP on CapRadio
    • Mick Martin's Blues Party
    • Programs A-Z
    • Podcast Directory
  • Schedules
    • News
    • Music
    • ClassicalStream
    • JazzStream
    • Weekly Schedule
    • Daily Playlist
  • Community
    • Events Calendar
    • CapRadio Garden
    • CapRadio Reads
    • Ticket Giveaways
  • Support
    • Evergreen Gift
    • One-Time Gift
    • Corporate Support
    • Vehicle Donation
    • Stock Gift
    • Legacy Gift
    • Endowment Gift
    • Benefits
    • Member FAQ
    • e‑Newsletter
    • Drawing Winners
    • Thank You Gifts
  • About Us
  • Contact Us
  • Close Menu
 We Get Support From:
Become a Supporter 
 We Get Support From:
Become a Supporter 

Send in the clones: Using artificial intelligence to digitally replicate human voices

By Chloe Veltman | NPR
Monday, January 17, 2022

Listen
/
Update RequiredTo play audio, update browser or Flash plugin.

Reporter Chloe Veltman reacts to hearing her digital voice double, "Chloney," for the first time, with Speech Morphing chief linguist Mark Seligman.

/ Courtesy of Speech Morphing

The science behind making machines talk just like humans is very complex, because our speech patterns are so nuanced.

"The voice is not easy to grasp," says Klaus Scherer, emeritus professor of the psychology of emotion at the University of Geneva. "To analyze the voice really requires quite a lot of knowledge about acoustics, vocal mechanisms and physiological aspects. So it is necessarily interdisciplinary, and quite demanding in terms of what you need to master in order to do anything of consequence."

So it's not surprisingly taken well over 200 years for synthetic voices to get from the first speaking machine, invented by Wolfgang von Kempelen around 1800 – a boxlike contraption that used bellows, pipes and a rubber mouth and nose to simulate a few recognizably human utterances, like mama and papa – to a Samuel L. Jackson voice clone delivering the weather report on Alexa today.

Talking machines like Siri, Google Assistant and Alexa, or a bank's automated customer service line, are now sounding quite human. Thanks to advances in artificial intelligence, or AI, we've reached a point where it's sometimes difficult to distinguish synthetic voices from real ones.

I wanted to find out what's involved in the process at the customer end. So I approached San Francisco Bay Area-based natural language speech synthesis company Speech Morphing about creating a clone – or "digital double" – of my own voice.

A reporter gets her voice cloned

Given the complexities of speech synthesis, it's quite a shock to find out just how easy it is to order one up. For a basic conversational build, all a customer has to do is record themselves saying a bunch of scripted lines for roughly an hour. And that's about it.

"We extract 10 to 15 minutes of net recordings for a basic build," says Speech Morphing founder and CEO Fathy Yassa.

The hundreds of phrases I record so that Speech Morphing can build my digital voice double seem very random: "Here the explosion of mirth drowned him out." "That's what Carnegie did." "I'd like to be buried under Yankee Stadium with JFK." And so on.

But they aren't as random as they appear. Yassa says the company chooses utterances that will produce a wide enough variety of sounds across a range of emotions – such as apologetic, enthusiastic, angry and so on – to feed a neural network-based AI training system. It essentially teaches itself the specific patterns of a person's speech.

Yassa says there are around 20 affects or tones to choose from, and some of these can be used interchangeably, or not at all. "Not every tone or affect is needed for every client," he says. "The choice depends on the target application and use cases. Banking is different from eBooks, is different from reporting and broadcast, is different from consumer."

At the end of the recording session, I send Speech Morphing the audio files. From there, the company breaks down and analyzes my utterances, and then builds the model for the AI to learn from. Yassa says the entire process takes less than a week.

He says the possibilities for the Chloe Veltman voice clone – or "Chloney" as I've affectionately come to call my robot self – are almost limitless.

"We can make you apologetic, we can make you promotional, we can make you act like you're in the theater," Yassa says. "We can make you sing, eventually, though we're not yet there."

A fast growing industry

The global speech and voice recognition industry is worth tens of billions of dollars,and is growing fast. Its uses are evident. The technology has given actor Val Kilmer, who lost his voice owing to throat cancer a few years ago, the chance to reclaim something approaching his former vocal powers.

It's enabled film directors, audiobook creators and game designers to develop characters without the need to have live voice talent on hand, as in the movie Roadrunner, where an AI was trained on Anthony Bourdain's extensive archive of media appearances to create a digital double of the late chef and TV personality's voice.

As pitch-perfect as Bourdain's digital voice double might be, it's also caused controversy. Some people raised ethical concerns about putting words into Bourdain's mouth that he never actually said while he was alive.

A cloned version of Barack Obama's voice warning people about the dangers of fake news, created by actor and film director Jordan Peele, hammers the point home: Sometimes we have cause to be wary of machines that sound too much like us.

[Note: The video embedded below includes profanities.]

"We're entering an era in which our enemies can make it look like anyone is saying anything at any point in time," says the Obama deepfake in the video, produced in collaboration with BuzzFeed in 2018. "Even if they would never say those things."

When too human is too much

Sometimes, though, we don't necessarily want machines to sound too human, because it creeps us out.

If you're looking for a digital voice double to read an audiobook to kids, or act as a companion or helper for a senior, a more human-sounding voice might be the right way to go.

"Maybe not something that actually breathes, because that's a little bit creepy, but a little more human might be more approachable," says user experience and voice designer Amy Jiménez Márquez, who led the voice, multimodal and UX Amazon Alexa personality-experience design team for four years.

But for a machine that performs basic tasks, like, say, a voice-activated refrigerator? Maybe less human is best. "Having something a little more robotic and you can even create a tinny voice that sounds like an actual robot that is cute, that would be more appropriate for a refrigerator," Jiménez Márquez says.

The big reveal

At a demo session with Speech Morphing, I get to hear Chloney, my digital voice double.

Her voice comes at me through a pair of portable speakers connected to a laptop. The laptop displays the programming interface into which whatever text I want Chloney to say is typed. The interface includes tools to make micro-adjustments to the pitch, speed and other vocal attributes that might need to be tweaked if Chloney's prosody doesn't come out sounding exactly right.

"Happy birthday to you. Happy birthday to you. Happy birthday, dear Chloney. Happy birthday to you," says Chloney.

Chloney can't sing "Happy Birthday" – at least for now. But she can read out news stories I didn't even report myself, like one ripped from an AP newswire about the COVID-19 pandemic. And she can even do it in Spanish.

Chloney sounds quite a lot like me. It's impressive, but it's also a little scary.

"My jaw is on the floor," says the original voice behind Chloney – that's me, Chloe – as I listen to what my digital voice double can do. "Let's hope she doesn't put me out of a job anytime soon."

Copyright 2022 NPR. To see more, visit https://www.npr.org.

View this story on npr.org

BuzzFeedVideo


Follow us for more stories like this

CapRadio provides a trusted source of news because of you.  As a nonprofit organization, donations from people like you sustain the journalism that allows us to discover stories that are important to our audience. If you believe in what we do and support our mission, please donate today.

Donate Today  

Coronavirus Newsletter

Get answers to your questions, the latest updates and easy access to the resources you need, delivered to your inbox.

 

Want to know what to expect? Here's a recent newsletter.

Thanks for subscribing!

We'll send you weekly emails so you can stay informed about the coronavirus in California.

Browse all newsletters

Most Viewed

We Get Support From:
Become a Supporter

Back to Top

  • CapRadio

    7055 Folsom Boulevard
    Sacramento, CA 95826-2625

    • (916) 278-8900
    • Toll-free (877) 480-5900
    • Email Us
    • Submit a News Tip
  • Contact Us

  • About Us

    • Contact Us / Feedback
    • Coverage
    • Directions
    • Careers & Internships
    • Mission / Vision / Core Values
    • Press
    • Staff Directory
    • Board of Directors
  • Listening Options

    • Mobile App
    • On Air Schedules
    • Smart Speakers
    • Playlist
    • Podcasts
    • RSS
  • Connect With Us

    •  Facebook
    •  Twitter
    •  Instagram
    •  YouTube
  • Donate

  • Listen

  • Newsletters

CapRadio stations are licensed to California State University, Sacramento. © 2022, Capital Public Radio. All Rights Reserved. Privacy Policy | Website Feedback FCC Public Files: KXJZ KKTO KUOP KQNC KXPR KXSR KXJS. For assistance accessing our public files, please call 916-278-8900 or email us.