1. Field of the Invention
This invention relates to speech recognition, and in particular, to methods for selecting entries from lists of entries in a speech recognition system.
2. Related Art
Speech recognition is being used as a user input to control the operation of a variety of systems. For example, navigation systems allow a user to speak requests for navigation information, such as directions to a destination. Telecommunications devices such as telephones, cell phones, etc. use speech recognition for functions such as name dialing. Some audio/video systems use speech recognition for audio or video player control. Speech recognition systems typically operate by matching voice patterns with information relevant to the application. For example, in a navigation system, the information may include information such as, city names, street names, proper names, addresses or music titles etc. The information relevant to the application is typically stored as a list of entries in a data structure. The data structures are typically stored in memory in the system employing the speech recognition.
The volume of information relevant to the application that is matched with the voice patterns is typically quite large. In operation, the speech recognition function must often select an entry from a large list of entries, which may require a large amount of memory for processing. Many systems that employ speech recognition may only have a moderate amount of memory available for processing.
Speech recognition may be implemented in systems with a moderate of memory for processor resources using a two-step approach. In a first step, a phoneme sequence or string is recognized by a speech recognition module. The accuracy of phoneme recognition is usually not acceptable and many substitutions, insertions and deletions of phonemes occur in the process. The recognized speech input, such as the phoneme string, is then compared with a possibly large list of phonetically transcribed entries to determine a shorter candidate list of best matching items. The candidate list may be supplied to a speech recognizer as a new vocabulary for a second recognition path. Such an approach saves computational resources since the recognition performed in the first step is less demanding and the computational expensive second step is only performed with a small subset of the large list of entries.
The computational effort required in cases involving very large lists may still be quite large. In a navigation system that uses speech-driven control, the user, or driver/speaker, may utter a combination of words to provide the information that identifies the destination, such as a city combined with a street in the city of destination. To illustrate in an example, there are about three million city-street combinations in Germany, which would require a very large list of entries. When the recognition step is to be carried out on such a large list, a matching step as described above would require memory and matching run time resources that may preclude incorporating the function in an embedded system in a vehicle. These large lists may also exist in other fields of application such as when selecting the name of an artist, song of an artist, e.g., when a voice controlled selection of a song should be incorporated into a product.
There exists a need for methods able to perform speech recognition involving very large lists of entries for information relevant to the application.