Searching through vast collections of documents for a particular document of interest has become commonplace in computing environments. In particular, searches performed on web pages found on the Internet are performed by a large number of search services. To perform these text-based searches, search services typically construct an inverted index that has a separate entry for each word found in the documents covered by the search service. Each entry typically lists all of the documents and the positions within the documents where the word can be found. Many of these search services use the position information to determine if a document contains words in a particular order and/or within a particular distance of each other. This order and distance information can then be used to rank the documents based on an input query with documents that have the words of the query in the same order as the query being ranked higher than other documents.
With more and more audios (or audio tracks of videos) appearing on the web and the trend towards on-demand video, the desire or need to search audio tracks available on the web and on-demand distribution channels is also becoming stronger. An approach of using Speech-To-Text (speech recognition) technology to transcribe audio to text, then applying text level indexing to the text, frequently does not yield good accuracy. The poor accuracy can be a reflection of web audio being of poor acoustic quality, very different domains compared to those used for training the speech recognition system, and/or complicated background environments. These factors can result in very high recognition error rate for an automatic speech recognition system (ASR).
The discussion above is merely provided for general background information and is not intended to be used as an aid in determining the scope of the claimed subject matter.