1. Technical Field
The present invention relates to automatic speech/speaker recognition (ASSR), and more particularly to an ASSR technique using a portable acoustic coupler or interface for remotely accessing an ASSR server over a communication channel such as a telephone line.
2. Discussion of Related Prior Art
In typical client/server or subscriber/service provider systems in which resources in the central server or service provider are accessed by numerous clients or subscribers, some form of client/subscriber authentication technique is used to verify that the client/subscriber is a valid system user. In many such systems, the authentication code, such as a PIN (personal identification number), keyword, or password is communicated by keying in a code or communicating in text to an operator. In some systems, the authentication code is by voice, i.e., the code is spoken into the server. The server (in this case, an ASSR server) recognizes the speech utterance and compares it against stored valid voice authentication codes to authenticate that the user is a valid client.
In ASSR server/client systems, it would be convenient if a client is able to remotely communicate his authentication code by voice over a communication channel such as a telephone line, making possible the capability for any client to have access to the server wherever a telephone line is available. The telephone line may be a land line or cellular line. In the cellular case, access to the server is completely portable, i.e., whenever and wherever a cellular phone and a cellular connection are available.
Operational difficulties associated with telephonic ASSR systems such as one described above include (1) loss of accuracy due to degradation of voice data when it is sent over telephone lines, and (2) the varied background noise characteristics at the user end depending upon the location of the telephone from which the user is calling, such as when a caller is calling from a street phone or when he is driving a car, etc. Both situations result in either data or signal integrity loss and thus severe reduction in the accuracy in recognition of the speech/speaker.
This loss of data and recognition accuracy problem can be reduced or eliminated if speech signal preprocessing (SSP) is performed at the client""s end prior to the signal being sent over the telephone line to the server. SSP includes characterizing the acoustic features of the transmitting device, environment, speaker, and the communication channel. The SSP information is processed by the ASSR server to set references, select appropriate decode models and algorithms to recognize the speaker or decode the speech by modeling the channel transfer function and the background noise to reduce word error rate of the speech or to accurately perform speaker recognition. However, to perform SSP at the user""s end, one would need SSP equipment including a computer having SSP software. Such SSP capability is generally absent in present standard telephones or network computers (NC).
Therefore, there is a need for a portable SSP device which is compact in size and light in weight for ease of transport, capable of coupling onto any telephone or a data communication device and includes capabilities for facilitating accurate speaker recognition when accessing the ASSR server over the communication channel and throughout the interaction with the server, and for accurate speech recognition communication between the portable SSP device and the ASSR server.
The illustrative embodiment of the present invention includes a portable SSP device, comprising a microphone for converting sound including speech, silence and background noise signals to analog signals; an analog to digital converter for converting the analog signals to digital signals; a digital signal processor (DSP) for generating from the digital signals feature vector data representing the speech and characterization data representing the silence and background noise signals; a coupler for coupling to an acoustic or data communication device for communicating the signals representing the feature vector data over a communication channel for recognition of the speech by an ASSR server at a remote location. The coupler is preferably an acoustic coupler which converts the feature vector data to acoustic signals and in such case, the communication channel is also acoustic, like a telephone line. Alternately, the coupler includes appropriate interface, e.g., connector, ports and protocols, for coupling to a digital transmission device for transmission over a data communication channel.
The portable SSP device preferably includes an encryption device for encrypting the feature vector data, and a data compression device for compressing the feature vector data. The portable SSP device preferably includes means for receiving and processing return signals from the ASSR server and means for converting the return signals to digital return data for processing by the DSP. In such preferred embodiment, the DSP further includes means for decompressing the digital return data and means for decrypting the digital return data.
The portable SSP device further preferably includes means for facilitating estimation of the transfer function of the communication channel, including acoustic characteristics associated with the speaker, silence and background noise; preferably, by sending a set of estimation reference signals (or xe2x80x9ccharacterization signalsxe2x80x9d) to the ASSR server connected to the channel at the remote location. The portable SSP device includes memory for storing data, including encryption key data or authentication data unique to that device.
Another illustrative embodiment of the present invention includes an ASSR system having a portable SSP device having a digital signal processor (DSP) for processing digitized speech spoken into a microphone and generating feature vector data representing the speech; and a coupler for coupling to a communication device connected to a communication channel such as a telephone line or a digital network connection and for converting the feature vector data to signals for communicating over the communication channel; and an automatic speech/speaker recognition (ASSR) server connected to the communication channel for receiving the signals transmitted from the portable SSP via the communication channel and processing the received signals for recognition of the speech.
The ASSR server in the system includes stored models of enrollment or authentication data. The models are built during subscriber or client enrollment. The ASSR server also stores a set of vocabularies and other models, such as language models and Hidden Markov Models (HMM), for speech recognition. The ASSR server processes the signals received from the portable SSP device and compares the processed signals with the stored models:
Advantageously, with remote speaker authentication capability, the system according to the illustrative embodiment of the present invention provides capabilities for remote smartcard or magnetic card activation/deactivation or password or PIN code change and reactivation.
Further, the remote speech recognition system according to the illustrative embodiment of the present invention provides ASSR capabilities with low error large vocabulary speech recognition, even in adverse signaling or highly distorted communication environments.