The invention relates to alignment of audio recordings with transcripts of the recordings.
Many current speech recognition systems include tools to form “forced alignment” of transcripts to audio recordings, typically for the purposes of training (estimating parameters for) a speech recognizer. One such tool was a part of the HTK (Hidden Markov Model Toolkit), called the Aligner, which was distributed by Entropic Research Laboratories. The Carnegie-Mellon Sphinx-II speech recognition system is also capable of running in forced alignment mode, as is the freely available Mississippi State speech recognizer.
The systems identified above force-fit the audio data to the transcript. Typically, some amount of manual alignment of the audio to the transcript is required before the automatic alignment process begins. The forced-alignment procedure assumes that the transcript is a perfect and complete transcript of all of the words spoken in the audio recording, and that there are not significant segments of the audio that contain noise instead of speech.