I recently read an interesting Fortune magazine article by Jennifer Alsever that discusses the use of voice for secure authentication (read it at http://fortune.com/2018/01/06/artificial-intelligence-voice-profiling).
Could voice authentication challenge other biometrics? The article cites work in Artificial Intelligence (AI) being conducted at Carnegie Mellon University’s Language Technologies Institute under Professor Rita Singh, with whom I recently had the chance to speak.
Dr. Singh notes in an academic paper that the different sounds that comprise continuous speech – strings of spoken words – are produced in rapid succession by modifying the shape of the vocal tract by moving the “articulators” (tongue, lips, jaw etc.). Different shapes then result in different resonance patterns, which are heard and interpreted as meaningful speech, or speech-like sounds.
“While this amazing variety of sounds is produced relatively effortlessly by a speaker, it is driven by complex physiological and mental factors that influence the motion, configuration and airflow in the vocal tract,” Dr. Singh explains. “These influences are different for different types and combinations of sounds produced as we speak, and at micro-levels, these influences are different for every speaker. As a result, it is almost impossible for the voices of two different individuals to be exactly the same at all levels. This, and the fact that many of these influences are beyond the voluntary control of the speaker, make exact and complete mimicry impossible.”
Micro-articulometry is the name given to the technology used to deduce these human profile parameters, employing AI to discover micro-patterns – or micro-signatures – that occur in some combinations of spoken sounds.
“Micro-signatures are at durational scales that may be imperceptible to humans and unobservable in standard visual or other representations of speech,” Dr. Singh says.
Singh’s work suggests that the human voice can definitively identify a person and, potentially, elements that relate to the speaker's physical, physiological, medical, behavioral, psychological, demographic, environmental and other characteristics.
Putting in into Practice
In a recent application, the U.S. Coast Guard has worked successfully with Professor Singh since 2014 to combat fake distress calls, where the cost of response can run from $5,000 to $15,000 per hour.
Advanced voice analysis can provide information not only about the physical characteristics of the caller, but also the environment they are calling from. It can identify serial callers and work with snippets of voice communication kept intentionally short to convey urgency.
Another potential application is to combat “swatting” – the placing of bogus calls to law enforcement advising of dangerous activities to prompt a strong deployment response. In January, Tyler Barriss of Los Angeles called authorities in Wichita, Kan., to advise of a made-up impending event involving a weapon and threat of fire. Police arrived on the scene, fatally shooting an unsuspecting Andrew Finch as he reached for his waistband. Barriss had also been charged in 2015 for making a bomb threat to a television station and supposedly had considered swatting FBI headquarters.
Editor’s Note: Barriss was sent to be held for arraignment to the Sedgwick County, Kansas correctional facility. This facility received the Elliott A. Boxerbaum security design project of the year award for 2017, profiled in the December issue of SD&I. Read more at www.securityinfowatch.com/12383323.
In quite a different vein, Dr. Singh used her methods to analyze the voice in a 1991 phone interview by People magazine with a “publicist” identifying himself as John Miller, who praised one Donald Trump. She concluded that the voice was, in fact, Donald Trump himself, although her scientific confidence level is not a full 100 percent. Dr. Singh also noted that Mr. Trump may have an issue with his nose that affects his speech.
Viability as a Biometric
Can the bar be set high enough so that it is virtually impossible for someone to impersonate another’s voice to the satisfaction of the AI-based system? The research suggests that this is likely the case. Will this supplant other biometrics? It appears that multi-factor authentication will remain a requirement; or, at the very least, procedural approaches would be needed to diminish the chances of a false positive.
For example, if a system prompted, “Speak your name,” a valid voice recording might accurately reproduce the name. Now, if the system then provided a random passphrase for response in a short time period, it is unlikely the imposter would be able to key up a second recording of that passphrase.
In addition to the instantaneously presented pass-phrases that a speaker may be required to repeat, the system could also generate some basic profile parameters of the speaker (e.g., age) from the recording, and match those against what might be expected of the speaker's voice at the time of authentication.
What if the person was present, under duress, and forced to speak the passphrase? Depending on the analytic, duress might be sensed with information used in a way similar to current duress codes and audio analytics. Also, a second form of credential, biometric or otherwise, could be employed.
What about cost? From a hardware standpoint, it is not bad. Required elements include a microphone to clearly reproduce content and the processing power of an Android device or Raspberry Pi. There is no special reader or appliance involved, and the technology is probably doable in an intercom station.
The science of voice profiling is still in its infancy – with near-term challenges in profiling accuracy in the presence of noise and voice disguise; and further out, in designing and engineering the right AI techniques to discover information at micro-levels, accurate measurement techniques, and in improving elements of the AI algorithm. Today, accuracy would compare to using fingerprints on a smartphone.
I do believe the technology will eventually meet – and exceed – the requirements for accurate identification, even as researchers continue to characterize an entire persona.
Ray Coulombe is Founder and Managing Director of SecuritySpecifiers and RepsForSecurity.com. Email him at [email protected], or contact him through LinkedIn at www.linkedin.com/in/raycoulombe or follow him on Twitter: @RayCoulombe.