Voice-driven local business search (VDLBS) is an increasingly popular application of speech recognition to mobile telephony services. With VDLBS, a user provides a desired location (e.g., city/state) and a business name. The most common traditional voice-driven search is a 411-style automated directory assistance, implemented as a speech-only, two-exchange dialog between the user and an automatic speech recognition (ASR) system. An exemplary 411-style voice-driven search dialog is conducted as follows: the ASR system first prompts a user to speak a location as a city and state; next, the user provides the requested city and state information; the ASR system then prompts the user for a listing input; the user provides the requested listing information; and finally, the system provides one or more matching listings.
In traditional voice-driven search, such as in the foregoing example, the ASR system uses one grammar, or language model, to recognize the city and state information, and subsequently uses separate location-specific grammars to recognize listings in each state. This type of voice-driven search provides relatively good recognition accuracy.
Advancements in ASR systems and search technologies have made one-exchange voice-driven search feasible. In this approach, the ASR system typically uses a large stochastic language model that gives the user the freedom to specify a location and a listing name and/or listing category together in a single utterance, and then submits the speech recognition results to a term frequency-inverse document frequency (TF-IDF) based search engine. This gives the user more flexibility to provide all searchable information at one time.
In evaluations of this approach, it has been found that, in one-exchange VDLBS, listing names are recognized at a much lower accuracy (e.g., in some instances below 55%) than locations (e.g., in some instances above 90%). When a location or listing name is not recognized by a one-exchange VDLBS, the user has to repeat both the location and the listing name, and perhaps several times. On the other hand, when a location or listing name is not recognized by a two-exchange interaction, only one piece of information has to be repeated. In effect, the advancements allowing one-exchange searching provide more dialog flexibility at the expense of recognition accuracy.
Due to reduced recognition accuracy and the resulting need for repeated utterances of both location and listing name information, users may become frustrated and be less willing to adopt or become repeat users of the one-exchange VDLBS applications. In contrast, a two-exchange interaction requires separate utterances up front, but only requires one piece of information to be repeated in the event of a misrecognition. This presents a problem for system developers, in that they must trade recognition accuracy for interaction flexibility, or vice versa.