1. Field
This disclosure relates to speech recognition systems, more particularly to methods to automate the tuning of speech recognition systems.
2. Background
Speech recognition systems typically translate from spoken words to either text or command outputs. While these systems have widespread applications, they generally fall into one of two categories.
The first category includes command and control applications. In these applications, the user speaks to an interface using command words and phrases contained in a grammar file. The interface may be any interface that can receive audible signals, including telephones, microphones, sensors, etc. The speech recognizer translates the spoken commands into the command language of the particular application to perform specific tasks. Tasks may include navigation of menus and access to files.
The second category includes dictation systems. In these systems the user dictates into the interface and the speech system produces the corresponding text as output. Generally, the user interface is a microphone connected to a computing platform of some kind, but is not limited to that particular configuration. Tasks include dictating email, composing documents, etc. Note that speech recognizers targeting dictation applications may sometimes be used for command-and-control purposes.
In these types of systems, mechanisms to improve system performance are generally very explicit. During use of these systems, the speech recognition process is not automatically tuned to the usage. The systems may provide a mechanism for system designers or the end user to tune behavior, but it is done separately from the use of the application.
For example, a command and control application may store audio for each interaction with the user. This stored audio may later be analyzed by an application designer and used to improve the data set used to train the speech recognizer. Some dictation packages include a separate application to allow the user to expand the system vocabulary or train the system in the recognition of certain words or phrases. These tuning mechanisms are explicit and separate from the normal, intended use of the system.
These applications do not include the ability to automate system tuning without impacting the user. These features would be useful in tuning systems, as well as providing a means for inexpensive and efficient initialization of these systems.