Speech recognition systems are currently in use for responding to various forms of commerce via a variety of communication channels such as the telephone network, internet, etc. One example of such a system is utilized in conjunction with a stock brokerage. According to this system, a caller can provide his or her account number, obtain a quotation for the price of a particular stock issue, purchase or sell a particular number of shares at market price or a particular target price among other types of transactions. Natural language systems can also be used to respond to such things as requests for telephone directory assistance.
Speech recognition systems typically are configurable, within limits, as to the amount of processing power, memory, network bandwidth, and other system resources that they may consume. Often, memory, speed, and accuracy can be traded off against each other. For instance, a particular configuration of one system may use less CPU resources than another, possibly at the cost of lower average speech recognition accuracy. Typical systems have default configurations that provide nominal accuracy and resource consumption across a wide range of recognition tasks. System configuration is often done only once or never, or left at default values, resulting in a particular resource/performance tradeoff for the particular deployment. Often the configuration parameters are only determined during the initial deployment of the system. Usually, a system developer calculates the parameters and only changes them during a manual reconfiguration.
Known systems determine and specify optimized recognition configuration parameters using an offline process. Speech waveform data is collected from the application, and then orthographically transcribed to specify the “reference” word sequence for each utterance. This data is then passed through a simulation of the recognition system in an offline process using a selection of configurations to determine the parameters. Finally, a decision is made by the system developers as to which configuration is used. Typically a different configuration is determined for each grammar or dialogue state in the system, which makes the process time consuming.
There are a number of issues, which make this offline process a burden for the developer. This burden exists because (a) it takes time and resources to run the simulations, (b) it requires collecting and transcribing data before the system is deployed, (c) it does not take into account how the usage of the system may change over time, and (d) it may require special skills which may not be readily available. The overall effect of these issues are that it adds expense (resources) and delay to the deployment of the system.