A. Field of the Invention
The invention relates to a distributed pattern recognition training system and method.
B. Description of the Related Art
In recent years, speech recognition systems have become capable of recognizing very large vocabularies, exceeding 200,000 words in some cases. Training these speech recognition systems requires a large amount of data. Thousands of hours of spoken training data may be used to train the acoustic models for a large vocabulary speech recognizer and billions of words of text may be used to train the language context models. In addition to the speech recognition itself, some applications of speech recognition also require large amounts of data. Training speech recognition post analysis systems to determine semantics or other hidden features (for applications such as audio data mining or to control an interactive dialogue) may require even more data than for the speech recognition itself.
Building higher-performance speech recognition and post analysis systems will require even more data than is being used in present systems. As the models become more sophisticated and more detailed, they require more data to train the larger number of parameters that determine the models. For an n-gram language model, for example, the number of possible n-grams is multiplied by a factor of the vocabulary size for each increase in the value of n by one. Similarly, the number of parameters in acoustic models grows by a multiplicative factor for each additional amount of context that is used.
Better pattern recognition based on language analysis is also valuable for analysis of any large collection of text, whether the text results from speech recognition or not. Training models for this language analysis of general text runs into the same issues as for the post analysis of speech recognition. A large quantity of training data is needed to train increasingly sophisticated models. Pattern recognition is also useful for mining data in large data collections for any type of data. Again, if this pattern recognition is based on models that look at the relationships between elementary events and variables, a large quantity of training data is needed in order to train the large number of combinations.
Fortunately, an enormous quantity of data is potentially available. A large telephone call center may record several million hours of recorded speech per month. The World Wide Web contains about 40 terabytes or more of text, and is continuing to grow rapidly.
Unfortunately, most pattern recognition methods are not able to cope with such enormous quantities of data. Many pattern recognition techniques are first developed on small sample academic problems and then, with great effort, are made scalable enough to handle real world problems with thousands of data frames. To train the higher-performance speech recognition and post analysis systems that take advantage of the large quantity of data available will require methods capable of handling billions of frames of data.
Not only is there a very large quantity of data available, but new data is being produced continuously. For many applications, it is important to keep the vocabulary and language context models up to date. For many data mining applications, it is also important to keep the models up to date. The queries that the public are likely to make to a telephone help desk, for example, will change as new products are introduced. Other classification applications may require tracking current events in the news. New proper names will be introduced to the vocabulary on an on-going basis for many applications. Both the acoustic models and the language context models must be updated to reflect these changes. However, this new data becomes available at many separate sites.
Thus, there is a desire to address one or more of the problems described above in conventional pattern recognition training methods and systems.