In recent months, prospects and clients alike have been asking us about deepfakes. What are we doing about them? Have we detected any deepfakes in our system? Is there still any value in having voice biometric systems if it is easy to defeat them? And so on. There’s a lot of concern.
This article has been prepared to answer these and other related questions.
What are Deepfakes?
Simply put, “deepfakes” are typically pictures or audio recordings or videos of people that have been generated by deep neural networks (DNNs). To create deepfakes, sophisticated DNN algorithms are trained on thousands of pictures, thousands of hours of speech, or thousands of hours of video clips. These algorithms “learn” the features that make a person unique, then a generalized DNN model is created. Subsequently, when these models are provided with more limited amounts of pictures, video, or audio from a specific person, they are to able replicate the unique features of that individual and generate a very realistic photo, audio clip, or video of that person.
Should You be Concerned?
YES, and we are too! However, ours is a healthy and respectful concern for deepfake technology, not one of panic. You should not panic either, and you should certainly not consider doing away with your use of voice biometrics. There are many factors to consider, and we’ll dive into the important details throughout the remainder of this document.
But first, we’ll share summary rationale for continued use of voice biometrics:
The alternative of relying on passwords and challenge-response questions alone, and NOT using biometrics as an additional security factor, is far riskier.
A properly designed system using multi-factor authentication (MFA) will remain very difficult, if not impossible, for a deepfake to break into.
There are various artifacts within deepfakes that cannot be detected by the human ear, but that are detectable by anti-spoofing technology such as ours.
Most articles about deepfakes rely on researchers, authors, and bloggers “breaking into” their own accounts, not other people’s accounts -- something far more difficult.
There have been no documented instances of widespread or systemic deepfake voice attacks occurring (anywhere), as the amount of time, effort, and technical acumen required to mount a deepfake attack is significant.
There are many other techniques available to help further thwart deepfake attempts.
Deepfakes in the News
There have been substantial improvements in DNN technologies in the past year, with numerous press releases and articles highlighting the powerful capabilities of DNNs. Consider OpenAI and its ChatGPT product. While not relevant to the purpose of this document, a year ago not many people knew about this generative AI technology or even OpenAI. Now, everyone knows about ChatGPT.
More relevant to this document, in the past several months companies like Google and Microsoft have announced that they’ve developed speech-related generative AI tools. Numerous press releases, blog posts, and articles quickly ensued. Consider this article about Microsoft’s VALL-E technology.
It’s a scary thought that someone’s voice might be closely replicated with just a few seconds of speech. So, voice biometric companies are naturally very concerned. The good news: voice biometric vendors have all known that these tools have existed for quite some time. And most of the responsible vendors have a variety of tools and techniques to address deepfakes. More on this later.
But deepfakes are not a “sudden” issue. The potential misuse of voice-related deepfake technology was highlighted almost two years ago in a story about a $35M wire fraud in Hong Kong. It was the first widely reported (and large) failure due to a deepfake – and there will no doubt be others reported in the future if adequate steps are not taken to protect against deepfakes.
Bypassing Voice Biometrics?
The $35M wire fraud described above used a sophisticated synthetic speech (deepfake) attack. However, this successful breach was ultimately due to exploited procedural and human errors. And there was no apparent voice biometric anti-spoofing technology in place. This point is worth mentioning, as most voice biometric vendors offer some form of technology to detect synthetic speech.
But, is this technology foolproof? The honest answer is that no technology is 100% foolproof. And in fact, there have been numerous stories in the news over the past few months highlighting how someone has been able to bypass voice biometric authentication systems using off-the-shelf generative AI technology. From February 2023, consider this article from an investigative journalist who "broke into" his own account at Lloyds Bank.
Opportunity for Discussion
Some in our industry would argue that this is a sensational and irresponsible piece of journalism since there are issues with the methodology, and the overall message about voice biometrics is negative. However, we see this as an opportunity to discuss the issues and educate people on the myths and truths about deepfakes.
Article Truths or Myths?
In all fairness, there are some truthful elements in articles like these. But there are also some myths or mistruths being provided. Regardless, the investigative journalist's article should be viewed as a wake-up call for companies -- both voice biometric software vendors and the clients who use our systems. And while we don't like the negative message about voice biometrics, articles like these do provide us with guidance -- and motivation -- to continue enhancing our offerings. Relative to some of the misleading information that is out there, consider some of these points:
Self-Collusion. The recent articles posted by investigative journalists all share one trait in common: they all “broke into” their own accounts. This is far easier than breaking into the account of an unknown person. And to this point: we have not seen any articles by investigative journalists who have been able to successfully break into other people’s accounts using these tools.
Bypassed Security Factors. Most larger banks, such as Lloyds Bank (mentioned in the article), all have a number of authentication factors working together behind the scenes – things like caller id and other network identifiers, telephone/device ids, knowledge-based authentication such as account numbers, etc. Collectively, these are critical components of multi-factor authentication that are frequently ignored in these articles.
Unmentioned Retry Attempts. Most of these articles also don’t cover how many times it took the journalists to create their own generated speech, how many attempts it took to log into their accounts, etc. We only see or hear about the last (successful) attempt. More advanced systems can track failed attempts and disable accounts with suspicious activity.
Anti-Spoofing NOT in Place. While most voice biometric vendors have anti-spoofing technology to offer clients, many clients don’t have these features fully setup and enabled. Our guess is that Lloyds Bank didn't have anti-spoofing systems fully up and running yet. And we suspect this is the case in other similar articles.
To this last point, why wouldn't anti-spoofing systems be in place? There are many reasons, the first of which is that they are relatively new (and deepfakes are a relatively new issue in the industry). Many vendors are still working to release upgrades to their systems, us included. Also, some vendors charge extra for these tools, so some customers are likely trying to save costs. And, other existing deployments may be exposed not for lack of money or interest, but rather are not (yet) fully operational, as they may require significant system upgrades from legacy platforms, potential downtime, etc.
Deepfakes in Context
With some background of deepfakes provided, it's now a good time to look at the context or rationale behind deepfakes, when and where they are most likely to be attempted, etc. Below are several topics that we feel are worth mentioning.
Compliant vs. Non-Complaint
One key question to ask is whether your users are complaint or non-compliant? By this, we are referring to their willingness and motivations to use voice biometric authentication. Non-compliant users are likely to be those who are mandated to use voice biometrics – for example, those under parolee monitoring situations. Compliant users are those who are interested in the further protection of their accounts – for example, banking and brokerage users, healthcare account users, and similar scenarios. Non-compliant users have greater motivation to attempt deepfake attacks on their own accounts.
Automated vs. Interactive
If you have an IVR system (or conversational AI or "bot") you may be more susceptible to deepfakes. The reason is that these systems are automated (not monitored by humans) and use short-duration speech for authentication (which is easier to synthesize). Compare this to call centers, which are far more interactive in nature. Longer passages and conversations are extremely difficult to fake, as there will be significant response delays, unnatural responses, etc. A call center agent will quickly know they aren’t speaking with a real person.
Those in the banking world know that much bank fraud is “friendly fraud” – committed by family members or friends who have access to your home phone/network or cellphone, know or can find out your knowledge-based-authentication (KBA) responses, etc. Unfortunately, family and friends are also far more likely to be able to record your voice. Together, these people effectively have comprehensive insider knowledge abouyt you and can bypass components of MFA more readily.
Beware of Social Media
It's important to mention social media, as it is enabling a new kind of friendly fraud. Simply put, complete strangers can now potentially access your face and voice, as pictures, videos, podcasts, and other recorded media are freely and openly posted to social media platforms.
While off-the-shelf generative AI tools such as those from ElevenLabs are making it easy to create deepfakes, it's also important to realize the amount of technical acumen required to break into systems protected by voice biometrics. You need to collect adequate speech samples from the target, you need to setup the deepfake software to create a realistic synthetic speech model for the target, you need to bypass all other security factors of the system you're trying to break into, and you need to inject the deepfake speech (quickly and accurately) into the live session.
Given this last point, we believe it is far more likely for deepfakes to be deployed in highly researched, isolated, and individualized scenarios. The article about the $35M wire fraud case is a good example. Also, there has been an significant recent uptick in deepfakes relative to "hostage" or "travel emergencies" where deepfakes target concerned family members (especially the elderly), with the deepfaked person being in a very stressful situation, where they urgently need money, etc.
For now, we believe deepfakes are most likely to be used in scenarios where family members are stranded while traveling and urgently need money. Or, to embarass someone via social media posts of video and audio clips. Or, within hostage-type situations.
While the above-referenced article and points of consideration are far from comprehensive, we can immediately recommend best practices to help manage the threat of deepfakes. There are two key recommendations to make for all deployments:
1. Implement Multi-Factor Authentication (MFA). This is our #1 recommendation, and it is not new. It’s critical to have a layered approach to security. The investigative journalist’s article showed that the author clearly stacked the deck in his favor. However, it remains highly unlikely that a fraudster would be able to bypass multiple security factors, have recordings of the true customer, and be able to interject them into an IVR session without detection. To recap, minimal MFA requirements are:
Something You Know. This could be a password, shared secret, or knowledge-based-authentication question.
Something You Have. This could be an ID (physical or digital), a token, cellphone, or other identifying item you possess.
Something You Are. This is a biometric factor, or specifically, using a voiceprint in our case.
2. Implement Anti-Spoofing Measures. We have a sophisticated, DNN-based anti-spoofing engine to help detect synthetic speech (deepfakes). Given recent events in the news and our industry as a whole, we’ve decided to update our legacy platform to always perform anti-spoofing checks on every sample that is submitted to the system. We return a specific error code so that appropriate actions can be taken (per client specifications).
Other Recommendations for IVR-Based Systems
Earlier we identified IVR systems, and particularly those with non-compliant users, as having a greater likelihood for deepfake attacks. With MFA and anti-spoofing in place, these additional recommendations may make sense for certain clients:
3. Implement Randomness. The IVR passphrase the investigative journalist used was a common, static passphrase provided by several competitors. Static passphrases have always been susceptible to recorded playback attacks as the speech content is known ahead of time. Random passphrases or digits will force fraudsters to dynamically generate responses in a timely manner – a more difficult task. If you have a static passphrase, consider swithing to our RandomPIN™ use case,
4. Use Outbound Calling. If you have an application-triggered authentication need (vs. using IVR as a front-end to your call center) consider outbound calls to a trusted (registered) phone instead of inbound calls to your IVR. This adds yet another factor to random prompts, strengthening authentication.
5. Implement Response Timers. This goes hand-in-hand with IVR sessions. Don’t allow your IVR dialogs to give users too much time for callers to respond. Fail them after a short duration and generate another (different) random prompt. If you are using our built-in IVR dialogs, we are already managing this for you.
6. Limit Retries. Don’t allow users many or unlimited attempts to complete their IVR session. We recommend no more than 3 total attempts per IVR session. Again, if you are using our built-in IVR dialogs, we are already managing this for you.
7. Implement Failure Detection. Failure detection is another feature we support on our platform. We can detect X failures in Y minutes within our system, something that can be useful if someone is specifically targeting an account and making multiple attempts to break in.
Other Considerations in the Fight against Deepfakes
Although not explicitly stated, it’s clear that voice biometric vendors and their clients must work together to properly configure the voice authentication system(s) in place. What’s also clear is that deepfakes are not going anywhere soon. Below is a summary of key initiatives in response to deepfakes:
Voice Biometric Vendor Countermeasures
Those of us in the voice biometrics industry have always been developing techniques to catch fraudulent speech samples. We develop measures, fraudsters then develop countermeasures. We then develop counter-countermeasures, and so on. This is yet another chapter of this cycle, and one which we’ll likely be caught in for some time. Again though, a properly configured voice biometric system, within the context of a multi-factor authentication scheme, still provides the best defense against deepfakes.
Due to the many articles about deepfakes, and the use of generative artificial intelligence systems in general, there have been multiple calls by congress and other government officials to study these threats and develop appropriate legislation. Google “Sam Altman on Capitol Hill” to gain more insights. He is the CEO and co-founder of OpenAI and is trying to raise awareness and develop responsible usage guidelines for this technology. There will no doubt be numerous new laws and regulations proposed in the near future relating to deepfake technologies.
Deepfake Vendor Initiatives
As the VALL-E articles show, Microsoft saw the potential for misuse of its technology and had limited how people can access and use it. Google’s conversational AI, “Bard”, has also been released with ethical limits. And ElevenLabs, the company behind the deepfake used by the investigative journalist we referenced earlier, has also changed the way they conduct business. Some vendors have also recently announced tools to detect deepfakes created by their own products, using “watermarking” or other markers they embed in the output.
These are promising initiatives, but clearly more needs to be done as these tools get better and better.
First and foremost, it’s important to recognize that deepfakes are a valid and growing concern for all of us. We treat malicious use of synthetic speech and voice conversion technologies very seriously. And for years now, our team has been dedicating significant resources to the research and study of these tools – both from the creation standpoint and the detection standpoint.
And while many of the recent news articles about deepfakes have headlines and content that are somewhat deceptive, these articles do help to keep vendors in our industry honest – and motivated – relative to developing increasingly better technologies for our clients.
And second, for Customer Not Present (CNP) applications, voice biometrics often remains the best (and only) “something you are” factor available. If you need to call a call center or IVR system, fingerprints won’t help, facial recognition systems won’t help, nor will other forms of biometrics. Voice biometric systems remain an easy and cost-effective part of your MFA strategy.
Abandoning voice biometrics because of isolated use cases is not the answer. It makes no sense at all to go back to using only User ID and Password or a KBA process. These have been proven to be weak security factors, easily hacked or discoverable via social engineering.
Voice biometrics are not 100% perfect (no factor is), but their use provides you with a statistically much greater level of confidence vs. not using it at all. This is true regardless of the potential presence of deepfakes.
We prepared this document to help our clients, prospects, and others understand deepfakes, what we recommend you do to protect your voice biometric systems from them, what we’re doing in our on-going efforts with anti-spoofing technology, and what others are doing to help. We’ve also tried to provide some context into which scenarios are most likely to see deepfake attacks.
Finally, we realize the topic of deepfakes is highly complex. Should you have any remaining questions, or would like to discuss your specific concerns and needs relative to deepfakes, please don't hesitate to contact us.