During the last years large amounts of audio and their associated transcripts have become available. The prevalence of digital archives of radio and television, lectures, speeches, etc. have become a useful source of information to speech researchers.
Aligning speech to a transcript of the speech is a task that can occur in speech recognition systems. Given properly trained acoustic models, a dictionary for mapping words to phonemes, and a word transcript, an algorithm such as the Viterbi algorithm can be used to align speech to a transcript. However, long audio can be difficult to decode using algorithms such as the Viterbi algorithm. Additionally, transcripts having high error rates can mislead decoding algorithms.