Computer-implemented recognition systems have been designed to perform a variety of recognition tasks. Such tasks include analysis of a video signal to identify humans captured in such signal, analysis of a video signal to identify a gesture performed by a human, analysis of a video signal to recognize an object therein, analysis of a handwriting sample to identify characters included in the handwriting sample, analysis of an audio signal to determine an identity of a speaker captured in the audio signal, analysis of an audio signal to recognize spoken words, analysis of an audio signal to recognize a language of a speaker in the audio signal, analysis of an audio signal to recognize an accent/dialect of a speaker in the audio signal, amongst other tasks.
With respect to automatic speech recognition (ASR) systems, such systems are becoming increasingly ubiquitous. For example, mobile telephones are currently equipped with ASR systems that are configured to recognize spoken commands set forth by users thereof, thus allowing users to perform other tasks while setting forth voice commands to mobile telephones. Gaming consoles have also been equipped with ASR systems that are likewise configured to recognize certain spoken commands, thereby allowing users of such gaming consoles to interact with the gaming consoles without requiring use of a handheld game controller. Still further, customer service centers accessible by telephone employ relatively robust ASR systems to assist users in connection with obtaining desired information. Accordingly, a user can access a customer service center by telephone and set forth one or more voice commands to obtain desired information (or to be directed to an operator that can assist the user in obtaining the information).
It is understood that performance of an ASR system is dependent upon an amount of labeled training data available for training the ASR system. For many languages, there is a relatively small amount of labeled training data currently available for training an ASR system, while for other languages there is a relatively large amount of training data for training an ASR system. Therefore, for certain languages, ASR systems are relatively poorly trained and thus inaccurate, and have difficulties with respect to large vocabulary speech recognition (LVSR) tasks.