Large enterprises receive numerous telephone calls, each of which must be routed in accordance with callers' instructions. Calls typically are routed by a human operator or an automated call routing system (commonly referred to as an “automated attendant” or “autoattendant”). Human operators typically route calls accurately and efficiently, but at a relatively high cost. Autoattendant systems, on the other hand, typically are cheaper to implement, but tend to be less accurate and efficient than human operators.
Traditionally, autoattendants play an announcement to the caller and prompt a caller to make one of multiple selections using a voice response unit. For example, the caller may be prompted to dial the extension of the party being called. The caller also may be given other options, such as leaving a voice message or accessing a directory of names if the extension of the called party is not known. Some early automated telephone directories required the caller to spell the name of the called party using a telephone dual-tone multifrequency (DTMF) keypad. Most recent autoattendant systems are voice-enabled, allowing callers to be routed to a desired call destination simply by speaking the name of the call destination. In these systems, an autoattendant answers an incoming call and asks the caller to speak the name of the party or department being called. The autoattendant includes a speaker-independent speech recognition engine that identifies and translates a received speech signal into name data. The autoattendant obtains a telephone number corresponding to the translated name data from a telephone number directory based on the translated name data, and routes the call to that telephone number.
Some autoattendant systems require the user to spell the identifier for a requested data item, such as a person's name. Some of these autoattendant systems attempt to identify the identifier being spelled by the caller before the caller has said all of the characters in the identifier. Such autoattendant systems may employ algorithms for disambiguating characters that often are misrecognized for one another. For example, speech recognizers typically confuse the letters B, C, D, E, G, P, T, V, and Z for one another. One discrete-spoken spelling system prompts the caller to say one letter at a time so that the system can know many letters were spoken and can identify and process each spoken letter separately. In addition, the system keeps track of all possible letter sequences while the caller continues to spell the requested identifier. The system compares each letter sequence with a list of allowable words and identifies the spelled identifier as soon as the list is reduced to a single identifier.
In another approach, a speech recognition system recognizes a word based on a continuous spelling of the word by a user. The system includes a speech recognition engine with a decoder running in forward mode such that the recognition engine continuously outputs an updated string of hypothesized letters based on the letters uttered by the user. The system further includes a spelling engine for comparing each string of hypothesized letters to a vocabulary list of words. The spelling engine returns a best match for the string of hypothesized letters. The system also includes an early identification unit for presenting the user with the best matching word possibly before the user has completed spelling the desired word.