1. Field of the Invention
The present invention is related to voice identification and authentication systems and more particularly, to providing reliable voice identification and authentication in Voice over Internet Protocol (VoIP) based telecommunications systems.
2. Background Description
State of the art telecommunication systems are digital and, frequently, use Internet Protocol (IP) based communications. Unlike analog voice channels with a continuous analog signal, an IP communications system segments audio data, encodes and packetizes the segments and transmits the encoded IP packets between network entities in a connectionless transfer. Bearing in mind that the human ear has a range of no more than 20 Hertz (20 Hz)-20 KHz and typical telecommunications channels may have only bandwidth of hundreds of KHz, audio occupies a very small portion of a typical IP communication. Standards have been developed and promulgated for Voice over IP (VoIP) communications to insure that typical IP networks compensate for transmission delays and address Quality of Service (QoS) issues. These standards select small size for audio segments for encoding as relatively small packets and select transmitting those encoded small packets at a relatively high frequency such that decoding and transmission delays are unnoticeable or, at least, tolerable.
For example, G729 is one such standard audio data compression algorithm for VoIP, wherein raw audio is segmented, typically, into 10 millisecond segments and each segment is compressed in an IP packet. RFC 3551 defines a net audio data stream for a G729 code/decode (codec) with an 8-kbit/sec data rate. See, e.g., www.apps.ietf.org/rfc/rfc3551.html#sec-4.2. While the popular Gxxx telecommunications codecs, such as G723 or G729, provide for efficient package based voice communications, they may not provide adequate or even necessary support for high quality voice data required by state of the art voice recognition.
A growing number of various applications use voice recognition for voice authentication. Typically, these voice authenticated systems store voice signatures, e.g., in a database, that are used to authenticate a caller. These systems may use voice identification and authentication to grant access to sensitive personal data, such as identifying and authenticating bank customers for remote banking. Once authenticated, customers may be granted access respective bank accounts for remote home control with banking systems responding, e.g., using voice commands. Protecting such sensitive personal data and resources against unauthorized access is important to protect the respective customer's property. Other state of the art applications of voice recognition include, for example, using high quality voice signatures for lawful voice signed agreements and voice recorded contracts. These voice identification and authentication applications require high quality voice data for reliable identification and authentication at a quality not provided by standard telecommunications codecs. While traditional digital voice telecommunications codecs, such as G711 for example, or media based codecs (e.g., for music or video, such as MPEG) may transfer voice with high quality, sufficient quality to meet the authentication needs, VoIP telephony do not.
As noted hereinabove, the voice and audio in VoIP telephony are usually encoded and compressed to allow more efficient bandwidth usage. As further noted this encoding and compression may still allow suitable conversational voice content, it only needs to be sufficient for a human at one end of a conversation to use any of many voice features to recognize his/her partner in a communication. These voice features may include, for example, the partner's language, grammar, sentence building, tones, accents and/or voice patterns. However, a machine uses mainly sound related fewer features to recognize a speaker's voice. These features may include tones, accents and voice patterns that may not be included or encompassed by the popular telecommunications codecs. Thus, the audio data provided in normal telecommunications conversations is of insufficient quality for voice recognition, which is required for reliable identification, authentication and signatures. On the other hand, authenticating using a high quality compact disk (CD) encoding or other media codecs, e.g., sending only the authentication data in a MPEG derivative (e.g., mp3) fails to provide much security, if any. Further, using high quality communications (i.e., sufficient for transferring reliable identification, authentication and signatures) has typically proven to be too costly and to use far too much bandwidth and channel resources.
Thus, there is a need for satisfying the limits of narrowband voice communication systems, such as in state of the art VoIP telephony systems using high-compression codec for conversations, while enabling voice identification, voice authentication and voice signature communications to systems and applications that require high quality voice data.