Directory assistance services are typically services through which users are assisted in finding telephone numbers (or other information) for business or residential listings. In recent years, automatic speech recognition systems have been deployed in directory assistance services. In such systems, it is intended that a user of the directory assistance service simply speak a listing, and the words spoken by the user will be recognized by the automatic speech recognition system. They can then be compared against a set of listings to identify the listing sought by the user. The information associated with that listing can then be provided to the user.
Of course, one of the most important tasks in the automatic directory assistance service is to predict how people will refer to a listing. Once that is done, it can help the automatic speech recognizer to recognize the user's input speech more accurately, and help the search component to search for the listings with better accuracy to obtain a telephone number (or other information) for the desired listing.
The ideal condition is to build a statistical language model (LM) for use in predicting a listing that the user has spoken by using manually transcribed actual call records. However, building a good quality LM requires a huge amount of training data while manual collection and transcription of domain specific speech data is very costly and time consuming, and most of the time, impossible, especially at the early stages of the system development.
Alternatively, predicting how users are going to refer to a given listing has been done in some current systems by employing humans to manually listen to actual recordings of users using the directory assistance services and then manually authoring rules that reflect what the users have said, hoping that the rules can be generalized to unseen listings as well. This can be very costly and time consuming, and usually has low coverage or over generalization. There are more than 18 million business listings in the United States alone. Therefore, any system that relies on manually written rules cannot easily scale up.
Therefore, still other systems build a statistical language model using only the data from the actual directory listings (e.g., using only the actual business names and residential names as they appear in the directory assistance listing). This is even more problematic. It is known that approximately 56 percent of users, when using directory assistance, do not recite the actual listed names. Instead, the user often omits words, or substitutes different words for those in the actual listing. As a result, a language model built based on the listed names alone performs poorly when the directory assistance system is actually deployed in a real world environment.
A few examples may be helpful. When using directory assistance to find a restaurant, a user may say “Kung Ho Restaurant”, while the actual business listing in the directory assistance database is “Kung Ho Cuisine of China”. If the language model is trained based only on the listed names in the actual directory assistance listing, the bi-gram P(Restaurant|Ho) is very low. Therefore, the automatic speech recognition system may favor another restaurant, such as “Kung Kung Restaurant” if “Kung Kung Restaurant” is in the actual business listing database. This is an instance in which the user substituted a word (Restaurant) corresponding to the category of the listing (the Kung Ho Cuisine of China restaurant would likely be listed under “Restaurants” in the directory assistance database) for an actual portion of the listing (Cuisine of China).
In another example, a user of directory assistance wishing to obtain a telephone number for “Microsoft Corporation” might simply say “Microsoft”. However, the probability in a language model generated from the actual listings P(</s>|Microsoft) where </s> is the sentence end symbol, is very low. Therefore, the automatic speech recognition system may pick another listing, such as “Micro Song” as the output. This is a case in which the user has simply omitted one of the words in the actual listing (here the word is “Corporation”).
The discussion above is merely provided for general background information and is not intended to be used as an aid in determining the scope of the claimed subject matter.