1. Technical Field
The present disclosure relates to speech recognition and more specifically to generating natural language models for use in speech recognition.
2. Introduction
To most users, an automatic speech recognizer (ASR) appears as a “black box” that accepts a speech signal as input, such as from a microphone, and that outputs the corresponding textual transcription. However, the speech recognizer includes several components such as the acoustic (feature extraction) front-end, the acoustic model, the language model, and various decoding algorithms. These components require training and/or calibration on large-amounts of application-specific speech and textual data for the recognizer to provide competitive, state-of-the-art accuracy for transcriptions of the speech signal. The training processes require expertise, computing infrastructure, and significant amounts of time.
Traditionally, speech recognition training is performed for clients via one of two methods. In the first method, a speech recognition company provides the entire speech recognition system (not just the recognizer) to the client. This approach raises intellectual property issues such as licensing, trade secrets, patent rights, copyright, and so forth for the speech recognition company, provider, or owner of the ASR engine. This approach is also expensive and does not scale well because it forces engine developers to maintain backward compatibility with several versions delivered to different clients or to devote special teams of developers to each different version.
In the second method, the client provides its own data and/or algorithm, and the speech recognition company trains the ASR models and evaluates the corresponding recognition accuracy. This is expensive for the speech recognition company. Further, the client exposes its intellectual property to the speech recognition company. This approach may raise concerns regarding the privacy of their potentially sensitive data or regarding unauthorized sharing of the speech data the client has spent so much time, effort, and money to develop.
Due to the limitations and intellectual property concerns of these approaches, a client and an ASR service provider may not cooperate at all or may not cooperate with the trust and cooperation necessary to recognize speech at a level of quality that would otherwise be possible.