The present application is related to a co-pending application which was filed concurrently herewith under the title xe2x80x9cSYSTEM, METHOD AND COMPUTER PROGRAM PRODUCT FOR A DISTRIBUTED SPEECH RECOGNITION TUNING PLATFORMxe2x80x9d which is incorporated herein by reference in its entirety.
The present invention relates to speech recognition, and more particularly to large-scale speech recognition.
Techniques for accomplishing automatic speech recognition (ASR) are well known. Among known ASR techniques are those that use grammars. A grammar is a representation of the language or phrases expected to be used or spoken in a given context. In one sense, then, ASR grammars typically constrain the speech recognizer to a vocabulary that is a subset of the universe of potentially-spoken words; and grammars may include subgrammars. An ASR grammar rule can then be used to represent the set of xe2x80x9cphrasesxe2x80x9d or combinations of words from one or more grammars or subgrammars that may be expected in a given context. xe2x80x9cGrammarxe2x80x9d may also refer generally to a statistical language model (where a model represents phrases), such as those used in language understanding systems.
Products and services that utilize some form of automatic speech recognition (xe2x80x9cASRxe2x80x9d) methodology have been recently introduced commercially. For example, ATandT has developed a grammar-based ASR engine called WATSON that enables development of complex ASR services. Desirable attributes of complex ASR services that would utilize such ASR technology include high accuracy in recognition; robustness to enable recognition where speakers have differing accents or dialects, and/or in the presence of background noise; ability to handle large vocabularies; and natural language understanding. In order to achieve these attributes for complex ASR services, ASR techniques and engines typically require computer-based systems having significant processing capability in order to achieve the desired speech recognition capability. In addition to WATSON, numerous ASR services are available which are typically based on personal computer (PC) technology.
One application of ASR techniques is the voice entry of addresses, i.e. street names, cities, etc. for the purpose of receiving directions. One example of such application is disclosed in U.S. Pat. No. 6,108,631. Such invention relates to an input system for at least location and/or street names, including an input device, a data source arrangement which contains at least one list of locations and/or streets, and a control device which is arranged to search location or street names, entered via the input device, in a list of locations or streets in the data source arrangement. In order to simplify the input of location and/or street names, the data source arrangement contains not only a first list of locations and/or streets with alphabetically sorted location and/or street names, but also a second list of locations and/or streets with location and/or street names sorted on the basis of a frequency criterion. A speech input system of the input device conducts input in the form of speech to the control device. The control device is arranged to perform a sequential search for a location or street name, entered in the form of speech, as from the beginning of the second list of locations and/or streets.
Such prior art direction services supply to a traveler automatically developed step-by-step directions for travel from a starting point to a destination. Typically these directions are a series of steps which detail, for the entire route, a) the particular series of streets or highways to be traveled, b) the nature and location of the entrances and exits to/from the streets and highways, e.g., turns to be made and exits to be taken, and c) optionally, travel distances and landmarks.
One difficulty that arises when attempting to identify and differentiate between the plethora of streets is the ability to accurately identify the street name corresponding to an utterance of a user. This problem is exacerbated as a result of the prevalent reuse of names, the varied pronunciations thereof, and the overall massive amount of street names in existence.
There is therefore a need for an improved technique of recognizing street names and the like for providing driving directions.
A system, method and computer program product are afforded for providing voice-enabled driving directions. Initially, an utterance representative of a destination address is received. Thereafter, the utterance is transcribed utilizing a speech recognition process. An origin address is then determined. A database is subsequently for queried generating driving directions based on the destination address and the origin address.
In one embodiment of the present invention, the origin address may be determined utilizing the speech recognition process. Further, the speech recognition process may include querying one of a plurality of databases based on the origin address. Such database that is queried by the speech recognition process may include grammars representative of addresses local to the origin address.
In another aspect of the present invention, the addresses may include street names. Further, the utterance may be received utilizing a network.