1. Technical Field
The present invention relates generally to automatic speech recognition (ASR) and, more particularly, to recognition of spoken alphabet and alpha-numeric strings using knowledge-based strategies applied to a list of hypothesized recognition results.
2. Description of the Related Art
ASR is used for various recognition tasks, including recognizing digit strings spoken by telephone callers. These digit strings typically represent credit card numbers, telephone numbers, bank account numbers, social security numbers and personal identification numbers (PIN).
Speech recognition is an imperfect art. Achieving high accuracy is difficult because multiple variables typically exist including, e.g., differences in microphones, speech accents, and speaker abilities. Recognizing spoken digit strings is particularly difficult because individual digits are short in duration, have a high degree of inter-digit acoustic confusibility, and are often co-articulated with adjacent digits. When digit-string (and alphabet or alpha-numeric) recognition is performed over a telephone network, the task is even more difficult, owing to the noise and bandwidth limitations imposed on the speech signal. Recognizing a string of spoken digits correctly requires that each digit be recognized accurately. Recognizing strings of spoken digits at high accuracy requires per-digit accuracies that are extremely high—in excess of 99%. The state of the art over-the-telephone digit recognition attempts to achieve about a 98% per-digit accuracy. Alphanumeric recognition over-the-telephone is even more difficult, with state-of-the-art recognition accuracy around 75% per character.
There is thus a need for a more accurate digit recognition technique, particularly for recognizing spoken digit strings over a telephone network.