Historically, speech recognition technologies have been integrated into training systems and similar technologies using very large, rigid speech models due to the lack of a software layer necessary to incorporate speech into the training system more effectively.
In conventional speech recognition technologies, implemented in training systems, the recognition software application, which is the part of the trainer core that consumes the recognized result, interprets the recognized text to take appropriate actions based on what the speaker says. This conventional approach requires the application to have intimate knowledge of vocabulary definitions. This requirement pushes the consuming application into an unbounded cycle.
In essence, an entire grammar including all possible words, numbers, phrases, etc. of a particular system are loaded into the compiled software application of a speech recognizer. For example, for a system designed to assist in the training of pilots, all the possible commands, orders, and responses (i.e. aircraft vectors, heading coordinates, etc.) that are used are loaded into compiled software. Then, when a speech pattern is received by the recognition system, it is compared to all of the possible grammar options that have been loaded into the system.
Many training systems, because of their specific environment, have to incorporate very large phraseologies (in essence a new vocabulary), which places a significant processing burden on the software and hardware of training system. This burden is even greater with more complex training systems such as those used by the military. Military training systems have a unique vocabulary, including many acronyms that are specific to organizations, exercises, times and places being simulated. Consequently, handling these large, unique vocabularies in simulations presents a huge challenge to speech recognition engines.
Therefore, there is a need for a recognition system that minimizes the size of the grammar and vocabulary required to be processed by the training software. This invention uses the framework of context driven speech recognition to minimize the size of the grammar loaded at any point in the system when it is executing a recognition task against an audio stream.