Evaluation of the quality of a transcription of audio data produced using automated techniques eventually relies on comparison to manual transcription. Speech-recognition systems can be evaluated using a set of manually transcribed utterances. These manually transcribed utterances serve as the basis upon which the evaluation is made. An automated transcription output of a speech recognition system, yields a word sequence for each utterance in the audio data. These automated transcriptions of the utterances are aligned to the manual transcriptions of the same utterances. This may be performed using Levenshtein's algorithm, disclosed at Levenshtein V.I. “Binary codes capable of correcting deletions, insertions, and reversals”, Soviet Physics Doklady 10: 707-710 (1966) which is hereby incorporated by reference in its entirety. After the automated and manual transcriptions are aligned, the number of correct words, incorrect words, and/or substitutions can be counted. A number of inserted words and/or deleted words in the automated transcription can also be computed with respect to the manual transcription. These figures are used to compute measures like the word error rate (WER), or the precision and recall (P/R) of the transcription systems.