The present invention relates to speech processing in general and to an application programming interface (API) to a speech processing system in a telephony environment in particular.
APIxe2x80x94Application Programming Interface
Calibrationxe2x80x94A preliminary algorithmic process which is occasionally required in order to calibrate the speech processing system and which is usually performed before any training is done. This is occasionally required for the correct operation of the system, and is usually performed once, at installation.
Calibration Formula (CF)xe2x80x94The outcome of the calibration process. In some cases, a calibration formula is needed as an input to the next phases of the speech processing processes.
Calibration Set (CS)xe2x80x94The input to the calibration process.
Calling Applicationxe2x80x94An application that uses the API in order to receive speech processing services from the speech processing system.
Persistent Objectxe2x80x94Data used during the speech processing which needs to be stored, either internally or externally. For example, in the case of speaker verification, the persistent objects include the calibration set, the calibration formula, the voice signatures and the verification formulas.
Speech Processingxe2x80x94Any algorithmic processing on spoken data, including, but not limited to, speaker verification, speaker recognition, word recognition, speech compression and silence removal.
Telephony Environmentxe2x80x94An environment in which conversations pass through a telephony medium, for example E1/T1 trunks, modems and analog telephone extensions, and including a network environment, for example, the Internet.
Trainingxe2x80x94The algorithmic process of learning from specific data in order to perform a particular task. In the case of speaker verification, the input to training is a collection of the speaker""s audio segments known as the speaker""s voice signature (VS). In the case of word recognition, the input to training is a xe2x80x9cword signaturexe2x80x9d.
Verification Formula (VF)xe2x80x94The output of the training in the case of speaker verification. The verification formula is used in the algorithmic process of speaker verification.
Speech processing technologies are known in the art. Speech processing products are available, for example, from Nuance Communications of Menlo Park, Calif., USA and Lernout and Hauspie Speech Products N.V. of Belgium.
Generally, systems providing speech processing services are integrated into other applications, and therefore there is a need for an interface between a speech processing system 100 and a calling application 102, both shown in FIG. 1, to which reference is now made. The designers of calling application 102 use an application programming interface (API) 104 so that calling application 102 may receive services from speech processing system 100. If API 104 is well designed and is adopted by many different vendors of speech processing systems, then the designer of calling application 102 can change from one speech processing system to another without having to change calling application 102.
I/O Software, Inc. of Riverside, Calif. USA, has produced a biometric API (BAPI) for communication between software applications and biometric devices such as fingerprint scanners and smart cards encoded with fingerprint biometric information.
The Human Authentication API (HA-API) project is an initiative of the US Government""s Department of Defense through the Biometric Consortium. The HA-API specification was prepared by National Registry Inc. of Tampa, Fla., USA.
The Speech Recognition API Committee created a speaker verification API (SVAPI). SVAPI enables the calling application to verify a claimed identity only after the speaker has finished speaking. There are a number of situations that SVAPI is unable to support. For example, it does not support online verification, i.e. verification that is performed while the speaker is speaking.
In a further example, SVAPI does not contain commands for handling the data that is required for training and verification, e.g. the voice signatures and the verification formulas. Rather, SVAPI assumes that the calling application is responsible for handling this data.
In another example, SVAPI does not allow the calling application to set policies relating to the speaker verification, such as the frequency of verification updates, the length of audio for verification and decision policies.
IBM Corporation of Armonk, N.Y. USA has produced the xe2x80x9cAdvanced Identification Services C APIxe2x80x9d. It is intended to be more specific and detailed than HA-API, but more general than SVAPI.
There is provided in accordance with a preferred embodiment of the present invention an application programming interface (API) for enabling a calling application to instruct a speech processing system to perform operations including online audio acquisition and algorithmic speech processing operations. The API includes acquisition interface means for enabling the calling application to instruct the speech processing system to acquire online audio from an external communication channel, and processing interface means for enabling the calling application to instruct the speech processing system to perform at least one of the algorithmic speech processing operations on said acquired audio.
According to another aspect of the present invention, the acquisition interface means and the processing interface means include object-oriented classes.
According to another aspect of the present invention, the external communication channel is selected from a group including a particular time slot of a telephone trunk, a particular telephone extension and an audio file of a remote audio storage.
According to another aspect of the present invention, the API further includes provision interface means for enabling the calling application to directly provide the speech processing system with provided audio.
According to another aspect of the present invention, the provision interface means include object-oriented classes.
According to another aspect of the present invention, the provided audio is any of a microphone recording and voice over Internet (VO/IP) data.
According to another aspect of the present invention, the processing interface means also enables the calling application to instruct the speech processing system to perform at least one of the algorithmic speech processing operations on any of the acquired audio, the provided audio and the combination thereof.
According to another aspect of the present invention, the processing interface means includes interface means for enabling the calling application to instruct, during acquisition of the acquired audio, the speech processing system to commence at least one of the algorithmic speech processing operations on the acquired audio.
According to another aspect of the present invention, the processing interface means include interface means for enabling the calling application to instruct the speech processing system to perform at least one of the algorithmic speech processing operations throughout a conversation with a speaker whose audio samples are contained in the acquired audio.
According to another aspect of the present invention, the speech processing system is capable of performing data management operations including creating, storing and retrieving data objects. The API further includes management interface means for enabling the calling application to instruct the speech processing system to perform at least one of the data management operations.
According to another aspect of the present invention, the management interface means include object-oriented classes.
According to another aspect of the present invention, the speech processing system has an internal data store and the management interface means include interface means for enabling the calling application to instruct the speech processing system to store data in the internal data store and to retrieve the data from the internal data store.
According to another aspect of the present invention, the calling application has access to an external data store and the management interface means includes interface means for enabling the calling application to instruct the speech processing system to store data in the external data store and to retrieve the data from the external data store.
According to another aspect of the present invention, the speech processing system has an internal data store, the calling application has access to an external data store and the API further includes at least one parameter for each run-time instance of one of the object-oriented classes, the parameter settable by the calling application, which determines whether the instance is stored in the internal data store or in the external data store.
According to another aspect of the present invention, the speech processing system is capable of performing calibration operations and the API further includes calibration interface means for enabling the calling application to instruct the speech processing system to perform at least one of the calibration operations.
According to another aspect of the present invention, the calibration interface means includes object-oriented classes.
According to another aspect of the present invention, the API further includes at least one parameter, settable by the calling application, which determines whether the calibration operations are performed automatically when there is enough input to the calibration operations.
According to another aspect of the present invention, the speech processing system has parameters that control its operation and the API further includes a mechanism for setting and retrieving the parameters at run-time.
There is also provided in accordance with a preferred embodiment of the present invention, an application programming interface (API) for enabling a calling application to instruct a speaker verification system to perform operations including online audio acquisition and verification. The API includes acquisition interface means for enabling the calling application to instruct the speaker verification system to acquire online audio from an external communication channel, and verification interface means for enabling the calling application to instruct the speaker verification system to verify the acquired audio.
According to another aspect of the present invention, the acquisition interface means and the verification interface means include object-oriented classes.
There is also provided in accordance with a preferred embodiment of the present invention a method of providing an application programming interface (API) for enabling a calling application to instruct a speech processing system to perform operations including online audio acquisition and algorithmic speech processing operations. The method includes the steps of providing acquisition interface means for enabling the calling application to instruct the speech processing system to acquire online audio from an external communication channel, and providing processing interface means for enabling the calling application to instruct the speech processing system to perform at least one of the algorithmic speech processing operations on the acquired audio.
According to another aspect of the present invention, the acquisition interface means and the processing interface comprise object-oriented classes.
There is also provided in accordance with a preferred embodiment of the present invention a method of providing an application programming interface (API) for enabling a calling application to instruct a speaker verification system to perform operations including online audio acquisition and verification. The method includes the steps of providing acquisition interface means for enabling the calling application to instruct the speaker verification system to acquire online audio from an external communication channel, and providing processing interface means for enabling the calling application to instruct the speaker verification system to verify said acquired audio.
According to another aspect of the present invention, the acquisition interface means and the processing interface means include object-oriented classes.
There is also provided in accordance with a preferred embodiment of the present invention a method of interfacing to a speech processing system capable of performing operations including online audio acquisition and algorithmic speech processing operations. The method includes the steps of instructing the speech processing system to acquire online audio from an external communication channel, and instructing the speech processing system to perform at least one of the algorithmic speech processing operations on the acquired audio.
There is also provided in accordance with a preferred embodiment of the present invention a method of interfacing to a speaker verification system capable of performing operations including online audio acquisition and verification. The method includes the steps of instructing the speaker verification system to acquire online audio from an external communication channel, and instructing the speech processing system to verify the acquired audio.