1. Field of the Invention
The present invention relates generally to an improved voice based keyword search system, and in particular to a search algorithm that uses confidence levels of keywords spoken by a caller to identify keyword indexed search items which best match the spoken keywords.
2. Description of the Related Art
A search engine is an information retrieval system designed to locate information stored on a computer system. When a user makes a request for information which meets certain criteria or keywords, the search engine searches and returns to the user a list of items that the search engine determines to be most relevant to the criteria or keywords entered by the user. This list is often sorted with respect to some measure of relevance of the results. Different search engines use different methods of determining the relevance of web sites, but most use some sort of quantitative method that determines the relevance of a site based on how many times the keywords appear in that particular site.
One way a user may enter keywords or criteria into a search engine is through the use of speech recognition devices. Speech recognition devices are generally known in the art of voice technologies. With speech recognition devices, a caller speaks into a microphone or telephone, and the speech recognition device receives the speech utterance and recognizes the words and phrases spoken by the caller. The speech recognition device may recognize the words or phrases spoken by the caller based on statistical features of the speech utterance as matching to known speech features of words in a vocabulary (e.g., represented in the grammar). A speech feature is normally represented as a feature vector which is a pure mathematical representation as mapped from the utterance waveform. Human speech is truly statistical in nature in that when a person says the same word several times, the waveform of the utterance will not be identical. However, the waveform should be similar, which results in speech feature vectors close to a vector representing the word in the speech feature vector. While a speech recognition device may identify the word or phrase which was most likely spoken by the caller, the statistical nature of speech recognition may also produce false results. This is due to the fact that the speech feature vectors are not identical every time, and thus there is a possibility the speech feature vector may be closer to another word feature vector, which is a cause for the misrecognition. In other words, the likely word or phrase identified by the speech recognition engine using statistics may not be the word or phrase actually spoken by the caller. For example, if a caller said the word “Boston”, the speech recognition engine may recognize it as “Austin”, since both words are acoustically close in speech features.