1. Field of the Invention
The invention relates to speech recognition technology. More particularly, the invention relates to systems and methods for determining the likelihood that the decoding output of a speech recognition engine matches the speech input to the speech recognition engine.
2. Description of the Related Technology
Speech recognition, also referred to as voice recognition, generally pertains to the technology for converting voice data to text data. Typically, in speech recognition systems the task of analyzing speech in the form of audio data and converting it to a digital representation of the speech is performed by an element of the system referred to as a speech recognition engine. Traditionally, the speech recognition engine functionality has been implemented as hardware components, or by a combination of hardware components and software modules. More recently, software modules alone perform the functionality of speech recognition engines. The use of software has become ubiquitous in the implementation of speech recognition systems in general and more particularly in speech recognition engines.
Software application programs sometimes provide a set of routines, protocols, or tools for building software applications, commonly referred to as an application program interface (API), or also sometimes referred to as an application programmer interface. A well-designed API can make it easier to develop a program by providing the building blocks a programmer uses to puts the blocks together in invoking the modules of the application program.
The API typically refers to the method prescribed by a computer operating system or by an application program by which a programmer writing an application program can make requests of the operating system or another application. The API can be contrasted with a graphical user interface (GUI) or a command interface (both of which are direct user interfaces), in that the APIs are interfaces to operating systems or programs.
Most operating environments, e.g., Windows from Microsoft Corporation being one of the most prevalent, provide an API so that programmers can write applications consistent with the operating environment. Although APIs are designed for programmers, they are ultimately good for users because they ensure that programs using a common API have similar interfaces. Common or similar APIs ultimately make it easier for users to learn new programs.
However, current speech recognition system APIs suffer from a number of deficiencies. Some are hardware dependent, making it necessary to make time consuming and expensive modification of the API for each hardware platform on which the speech recognition system is executed. Others are speaker dependent, requiring extensive training for the system to become accustomed to a particular voice and accent. Additionally, current speech recognition systems do not allow dynamic creation and modification of concepts and grammars, thereby requiring time consuming recompilation and reloading of the speech recognition system software. Some speech recognition systems do not utilize flexible phrase formats, e.g., normal, Backus Naur Form (BNF), and phonetic formats. In addition, current speech recognition systems do not allow dynamic concepts with multiple phrases. Current speech recognition systems also do not have a voice channel model or grammar set model to allow multiple simultaneous decodes for each speech port using different combinations of grammar and voice samples.
Therefore, what is needed is a system and method for a speech recognition system API that solves the above deficiencies by allowing flexible, modifiable and ease of use capabilities, including, e.g., being hardware independent, speaker independent, allowing dynamic creation and modification of concepts and grammars and concepts with multiple phrases, utilize flexible phrase formats, and have a voice channel model or grammar set model to allow multiple simultaneous decodes for each speech port using different combinations of grammar and voice samples.