The federal government is trying to promote efforts to counter so-called audio deepfakes by awarding four organizations prizes for developing technologies to distinguish between authentic human speech and audio generated by artificial intelligence.
The prizes, awarded Monday by the Federal Trade Commission, come as the agency has issued warnings to consumers about scams using AI-generated voices and as concerns grow about the influence of deepfakes on this year's elections.
One winner, OriginStory, created by people including researchers at Arizona State University, uses sensors to detect things that humans do such as breathing, motion and heartbeat information.
"We have this very unique speech production mechanism. And so by sensing it in parallel with the acoustic signal, you can verify that the speech is coming from a human" Visar Berisha, a professor at ASU's College of Health Solutions who was part of the team, told NPR in an interview.
OriginStory's technology leverages sensors in existing hardware, though it's not available in all recording equipment - newer Android phones and iPhones don't have them. Berisha says they are focusing on hardware that already can use it.
Another, DeFake, made by Ning Zhang, a researcher at Washington University in St. Louis, injects data into recordings of real voices so AI-generated voice clones don't sound like the real person.
Zhang says his technology was inspired from earlier tools developed by University of Chicago researchers that place hidden changes in images so that artificial intelligence algorithms that try to train themselves on the images cannot mimic them.
A third, AI Detect, uses AI to catch AI. It's made by startup Omni Speech, and its CEO David Przygoda says their machine learning algorithm extracts features from audio clips like inflection and uses that to teach the models to tell the difference between real and fake audio.
"They have a really hard time transitioning between different emotions, whereas a human being can be amused in one sentence and distraught in the next." says Przygoda.
In an FTC video, the company says its tool should be over 99.9% accurate, but Przygoda says it's not yet ready for independent testing, and the contest judges did not test the submissions under real world conditions.
The three winners will split a cash prize of $35,000. Another company, Pindrop Security, received recognition for its work but will not receive cash because it's a larger organization.
In a statement, the company says the FTC's award, "recognizes and validates that a decade of work Pindrop has put in to study the problem of voice clones and other deepfakes."
There are a number of existing commercial detection tools that rely on machine learning, but factors such as sound quality and media format can make them less reliable, and detectors need to constantly be trained on new deepfake generators to catch the audio they produce.
Berisha says this is a losing game in the long term, referencing the challenges that detectors of AI-generated text faced. OpenAI took down its own detector months after it launched, citing poor results.
"I think we're headed in exactly the same direction for voices. And so if AI-based detectors work now, I think they'll work less well six months from now and less well a year from now, two years from now. And at some point, they will stop working altogether."
This drove him to develop a process that authenticates human voices as words are being spoken.
Earlier this year, the Federal Communications Commission banned AI-enabled robocalls. Synthetic audio company ElevenLabs has a tool to detect its own product and is calling on other producers of synthetic voices to do the same. The European Union recently passed and "AI act' and state legislatures in the U.S. are passing laws that regulate deepfakes. Researchers, detection software makers and the FTC say that technology like this can't be the only solution to the challenges posed by generative AI tools.
Copyright 2024 NPR. To see more, visit https://www.npr.org.
 
 
 
 
