1. Field of the Invention
The invention relates to the field of speaker recognition, and more particularly to the field of using voice biometrics to identify or authenticate speakers using a mobile device.
2. Discussion of the State of the Art
Identification and authentication of mobile phone users is an issue of significant importance, particularly because of the dramatic variety and rapid rate of adoption of mobile applications. As just one example, as new types of location-based services proliferate that allow users and business to connect and transact while one or both of them are mobile, it becomes more and more difficult to ensure the safety of such transactions. Simple authentication techniques, for example use of basic passwords, while possibly adequate in a time when web usage was primarily carried out by users on personal computers generally associated with fixed locations, are clearly inadequate today, when web usage mostly is occurring from a wide and expanding variety of mobile devices. For example, most mobile devices connect to the Internet in myriad different ways, many of which are far less secure than connecting via a dedicated home broadband connection, such as connecting via an unsecured WiFi at a coffee shop.
In the art, three main approaches have been used in securing interactions between users of computing devices and the various web-based services and content repositories they wish to access or use. These are shown, along with some examples, in FIG. 4. The first method can be referred to as authentication based on “something the user knows”, or more formally as a first factor 403 based on a user's knowledge, such as a static password 404. The second approach can be referred to as authentication based on “something the user has”, or more as a second factor 410 based on a user's possessing something that is his alone, such as a one-time password (OTP) 410 or a secure identification token 411. The third approach can be referred to as authentication based on “something the user is”, or more formally as a third factor 420 based on a permanent biometric attribute of the user to be authenticated, such as a fingerprint 421. The axes in FIG. 4 represent increasing level of security along x axis 401 (that is, methods that are further to the right are generally more secure than those on their left), and a number of authentication factors along y axis 402. As shown in FIG. 4, the first factor 403 corresponding to what a user knows is the least secure of the three, with the second factor 410 corresponding to what a user has being more secure and the third factor 420 corresponding to what a user is being most secure. It is common in the art to combine two or three of these factors in various ways to achieve greater security. In general, an overall level of security is sought that is consistent with the value of the underlying activity and the damage that might occur if security measures for a given scenario were defeated by one or more malefactors. For example, it is quite common in national defense, counterterrorism, and law enforcement applications for three-factor authentication systems to be used, sometimes even featuring more than one type of biometrics (for example, combining fingerprint and voiceprint identification).
FIG. 5 illustrates a typical example, known in the art, of knowledge-based authentication, which is a implemented as an extension of existing simple-password authentication. A user 520 initiates some action via interface 532 on a computing device such as a laptop computer 510 or a smart phone 511. Accordingly, the computing device sends a request via interface 531 to server 500, which returns an authentication request to the computing device, which requires user 520 to enter some previously agreed knowledge credential. If the user 520 enters the appropriate credential, she is allowed to carry out the requested action. Examples of knowledge-based authentication include Bank of America's “SiteKey” function, HSBC's virtual keyboard, and the like. This approach only improves on basic password-based authentication slightly, since it still is a single-factor approach and is carried out “in-band”, that is using the same interface as is used to carry out the requested action (a usual example is a web browser, and a typical application would be online banking using laptop 510 or mobile device 511).
FIG. 6 illustrates a somewhat improved authentication approach that uses out-of-band communication, known as server-generated one-time password (OTP) authentication. Again, user 520 requests some action to be taken using interface 611 on laptop computer 510. The request is forwarded to a server 500 such as a web server, which determines that request is one that requires authentication of the user. Having previously stored information about user 520 (specifically, the user's mobile phone number in this example), server 500 sends a special code to the user's mobile device 511 in step 620. The user receives this special code in step 621 (typically a text-based code is displayed on the screen of the mobile device), and the user 520 then enters the special code in step 622 at laptop computer 510, which then sends the code to server 500 for authentication. This approach has the advantage of using two factors, one of which is carried out using a separate device (that is, out-of-band), and thus is stronger than the approach illustrated in FIG. 5.
FIG. 7 illustrates another common authentication approach used in the art, known as client-generated OTP. These are similar to conventional OTP tokens such as RSA™ SecurID, VeriSign™ VIP OTP, and the like. In this approach the user 520 again requests an action using interface 611 on laptop computer 510, the request the being sent on via interface 610 to server 500. Server 500 then sends an authentication request to the computer 510, which then requests a code from the user 520. The user 520 gets the code from her mobile device 511 via interface 620 (typically a special mobile application provided by the entity that operates server 500), and the user 520 enters the code on computer 510 in step 630 and the computer 510 sends the code on to the server 500 in step 631. Once the server 500 validates the code, the user 520 is authenticated and the server 500 performs the requested service. This approach is more secure than that shown in FIG. 6, as it is two-factor and does not depend on transmitting a one-time password on any public network. But it is still susceptible to man-in-the-middle attacks.
FIG. 8 shows yet another approach to authentication known in the art, known as out-of-band authentication. In this approach, user 520 requests an action on computer 510 via interface 811. The computer 510 then requests the action from server 500, which causes a phone call (or other out-of-band communication) to be initiated with the user's mobile device 511 via interface 820 (typically a mobile phone network). The user answers the call and using interface 821 is requested to authenticate for example using voice authentication. This approach is even more secure, since the authentication is separate from the browser on computer 510 and since a biometric factor is used. However, this approach is expensive since it requires phone calls to be made over public phone networks, and it is somewhat unwieldy from a usability perspective.
FIG. 9 illustrates an even newer approach to mobile authentication, which is refereed to as in-band mobile OTP authentication. In this case everything happens through mobile device 900, 910, using specialized authentication applications provided by an entity desiring to engage in secure interaction with its users (for example, AOL™. PayPal™, and eBay™ provide applications along these lines). Looking at mobile device 900, a token application 910 is displayed that functions much as secure tokens have done for some time, providing a time-based unique code to use as an OTP (it can be made unique because it is based on some hidden algorithm that is based on a universal time and an identity of the device on which the application is running, which device is associated with a single user). Similarly, mobile device 910 shows a variation in which a VIP Access application 920 is provided that displays both a credential ID 921 and a security code 922 to a user. These applications are useful, but they have two main drawbacks. First, they tend to be useful only for the purposes provided for by the provider of the application (for instance, a corporate IT department), so a user would potentially have to have several such applications available on her mobile device. Second, the approach is only as secure as the user's custody of their mobile device is; if the user misplaces her mobile device, security may not be as readily ensured (essentially, this is a two-factor approach based on what you have and what you know, but not based on what you are).
Another approach that has been used in the art is out-of-band mobile device-based authentication, which is essentially the use of a mobile device as a secure “what you have” authentication token. Several solutions are known in the art, such as those using iOS's APNS and Android's C2DM services. These can be used to provide a real-time out-of-band challenge and response mechanism on a mobile device. Upon performing a sensitive transaction or login, a user immediately receives a challenge pushed to her mobile device. She is then prompted with the full details of the proposed transaction, and is able to respond to approve or deny the transaction by simply pressing a button on her mobile phone. Smart phone push-oriented two-factor authentication is attractive because it is at once both more user-friendly and more secure than previous approaches.
Even though two-factor authentication provides significantly better security, organizations are discovering that as attacks increase in sophistication, the two-factor authentication is simply not enough. There are many challenges with the various OTP devices out in the market today. These challenges include weakness of static passwords, difficult to carry form factors, and insecure form factors.
The initial purpose of OTP and USB tokens was to strengthen the static password and to add an additional one-time password that was harder to obtain. The rationale behind the two-factor authentication approach was that user needed to have two different data elements, both secure, to access a secure region. The user usually chooses their individual static passwords. Most users have a tendency to choose a memorable combination of numbers and characters that is easy for the user to remember. The users may also write down their password in case they forget it. The combination of these reasons makes static passwords easily stolen or easily guessed by fraudsters. Now that the static password is no longer a secure data element, the only real data element that is preventing unauthorized entrance to secure regions is the OTP. This fact makes it easy for fraudsters to access unsecure regions simply by stealing OTP tokens.
Another challenging issue is that the OTP and USB tokens are hardware devices that are not easy to carry. Most OTP and USB devices are in from of tokens that are made to be a part of the key chain held by the end user. The market is leaning towards hardware that can be stored in the wallet and therefore this challenge may eventually be addressed. However, for now the majority of OTP tokens reside in a very clumsy form.
The last challenge that OTP tokens have is the fact that the token itself is not secure. All the tokens today are either time based (the token changes the one time password every x min/sec) or event based (the token changes the one time password every time a button is pressed on the token). There is no security measure taken when the one time password appears. This, theoretically, increases the chance that the token and static password could be stolen, compromising the security of the site.
Nevertheless, a better solution for the challenges above is to use another strong authentication method that addresses the “what you are” factor in a multi-factor authentication approach. The most common “what you are” solution is a biometric solution. The strong authentication market is reaching a point of understanding that the two elements of a two-factor authentication solution need to be “what you have” and “what you are” rather than “what you know” and “what you have”. The reason for this change is the understanding that “what you know” data elements are no longer secure. Static passwords are easily stolen and gaining personal information regarding a certain individual is not a high barrier for fraudsters and identity thieves. This, and the fact that biometric authentication devices have become mature enough that it is possible for them to process biometric authentication with a very low false positive rate and at reasonable cost, make biometric authentication a valid and promising solution in the market.
Evaluating multi-factor authentication solutions requires a look at three critical areas—the security and scalability of the technology, hurdles to user adoption, and the total cost (including internal costs) to deploy and support the system. Because of the cost and complexity of most biometric systems, use of biometric authentication is generally limited to ultra high security applications (e.g. the defense industry). Historically, biometric systems have been a mixed bag when it comes to availability, compatibility, and security. Training is a significant issue and logistics are perhaps more difficult than with any other two-factor solution. Deployment involves collecting the biometric data to compare against, which can be a daunting task for users and IT departments. In addition, most biometric authentication solutions rely on fingerprint readers, retinal scanners, or other biometric devices, which are attached to the pc or laptop. The cost and IT resources required to purchase, deploy, and maintain biometric readers often presents an impractical challenge to surmount.
One approach to addressing these problems would be to use strong authentication such as through biometrics (that is, based on “what you are”) carried out directly on a mobile device. However, in the current art speaker recognition approaches are still too heavy (resource intensive) to run on even very advanced capability mobile devices, so both voice print creation and comparison are typically performed on remote servers. This further means that audio collected on a mobile device must be transmitted through a data channel to a server, which creates a bandwidth problem, as well as the risk of a man in the middle attack. In fact, successful man-in-the-middle attacks may send recorded voice signals to a speaker recognition server, and thus, may be able to perform false authentication remotely. In general, codecs could be used to reduce the bandwidth required when sending voice signals, but in that case accuracy degradation would be expected.
What is needed in the art is a cost-effective voice biometric capability adapted for easy adoption and use on plural mobile devices per user. Such a capability must be capable of winning user trust, particularly in terms of being practically unbreakable. In addition, privacy concerns suggest an approach where there is no need for the centralized storage of large numbers of voice biometric prints, since breach of such a database would compromise potentially millions of voice biometric prints—a clearly undesirable situation since users cannot change their voices, and since voices can be duplicated (making voice biometrics potentially more vulnerable than fingerprint or retina biometrics). Furthermore, what is needed is a voice biometric capability that does not require much bandwidth to operate, and that is able to operate with acceptable accuracy on a wide range of mobile devices (which often suffer from limited memory or processing capacity relative to the demands of robust voice biometrics).