Speech recognition devices have been developed with varying degrees of success. There is great variability in how different speakers pronounce words as well as variability in how an individual speaker pronounces words from one time to another. Current speech recognition technology has not yet been developed to the point of accommodating such variabilities to the extent with which a normal human listener can. For example, speech which is dictated into a mini-cassette recorder and then transcribed by a typist will typically have far fewer errors than if the same text is dictated directly to a current technology speech recognition computer program.
Several methods have been developed to assist the current technology devices in accommodating these variabilities, primarily through use of training. For example, the user may be asked to speak each new word at least once prior to using it. Or the speaker may be asked to read a list of frequently used words to the device. The speaker may be asked to monitor the recognized text and correct errors. All of these methods allow the recognition device to "learn" by adapting to the speaker's variability and in some cases variability between speakers. Nevertheless, it frequently occurs that the best approach for an unrecognized, difficult, or new word is for the speaker to spell it.
In other applications a speaker may select items from a list by saying the name of a letter associated with each item.
However, many of the letters sound very similar and may be confused for each other, even for human listeners. It is therefore known in the art to spell a word phonetically which is meant to indicate using a commonly understood word for each letter in the word to be spelled. For example, one may phonetically spell the work "key" by saying "kilo echo yankee." One may also use a phonetic alphabet when selecting items from a list by using a word from the phonetic alphabet rather than saying the corresponding letter name.
A list of such words, one for each letter, arranged in alphabetical order is commonly known as a phonetic alphabet. Table 1 below lists an example of a phonetic alphabet.
TABLE 1 A Alpa N November B Bravo O Oscar C Charlie P Papa D Delta Q Quebec E Echo R Romeo F Fox-trot S Sierra G Golf T Tango H Hotel U Uniform I India V Victor J Juliet W Whiskey K Kilo X Xray L Lima Y Yankee M Mike Z Zulu
Various such alphabets have been developed and used over the years with human listeners, primarily by the military, for clearly communicating over sometimes noisy or unreliable radio or telephone links. The developers of speech recognition devices and programs have likewise incorporated a phonetic spelling feature in their products for word spelling.
Other uses for phonetic spelling in association with speech recognition devices include communicating voice commands to a voice activated device as described by Basore et al. in U.S. Pat. No. 5,752,232, or to retrieve information from a directory, e.g. a telephone directory, in response to a phonetically spelled word as described by Dubnowski et al. in U.S. Pat. No. 4,164,025.
Phonetic spelling may be used to generate an audio output to a human listener in an audio response unit such as described by Barnett et al. in U.S. Pat. No. 4,653,100 and Silverman in U.S. Pat. No. 5,890,117. There is no speech recognition involved in this use of phonetic spelling which is the reverse process of speech generation.
In order to use a phonetic spelling feature, the speaker must have knowledge of the phonetic alphabet. This knowledge is easily learned in a military environment where, for example each signal core soldier is taught the phonetic alphabet as part of his signal core training. Ordinary users of speech recognition software have not been so trained and therefore keep a printed or handwritten list of the phonetic alphabet near their devices for use as necessary. Even so, it is awkward and slow for the ordinary user to visually search through the list for each phonetic word needed to phonetically spell a new word. As indicated above, the need to spell a word occurs more frequently when using current technology speech recognition devices than when dictating to a human transcriber because of the lesser accommodation to variations in pronunciation of the devices, further compounding the awkwardness.