By translating voice data into text, speech recognition has played an important part in many Natural Language Processing (NLP) technologies. For instance, speech recognition has proven useful to technologies involving vehicles (e.g., in-car speech recognition systems), technologies involving health care, technologies involving the military and/or law enforcement, technologies involving telephony, and technologies that assist people with disabilities. Speech recognition systems are often trained and deployed to end-users.
The end-user deployment phase typically includes using a trained acoustic model to identify text in voice data provided by end-users. The training phase typically involves training an acoustic model in the speech recognition system to recognize text in voice data. The training phase often includes capturing voice data, transcribing the voice data into text, and storing pairs of voice data and text in transcription libraries. Capturing voice data in the training phase typically involves collecting different syllables, words, and/or phrases commonly used in speech. Depending on the context, these utterances may form the basis of commands to a computer system, requests to gather information, portions of dictations, or other end-user actions.
Conventionally, NLP systems captured voice data using teams of trained Natural Language Processing (NLP) trainers who were housed in a recording studio or other facility having audio recording equipment therein. The voice data capture process often involved providing the NLP trainers with a list of utterances, and recording the utterances using the audio recording equipment. Teams of trained transcribers in dedicated transcription facilities typically listened to the utterances, and manually transcribed the utterances into text.
Though useful, conventional NLP systems have problems accommodating the wide variety of utterances present in a given language. More specifically, different NLP end-users may provide different commands to perform similar tasks. As an example, it is often difficult to train an NLP system to recognize the different pronunciations, syntaxes, word orders, etc. that that different NLP end-users use when requesting an NLP system to perform a specific task. It would be desirable to provide systems and methods that accurately and cost-effectively collect and store the utterances NLP end-users are likely to use for a command context.