A search engine is a system that retrieves information from a database. Here, a database can be any type of repository containing electronic documents, for instance: the Web, mailing archives, file repositories, etc. Documents can contain text, images, audio and video data. Most search engines only index the textual part of documents.
A speech recognition engine automatically converts spoken words from an audio stream into electronic text. The result of the operation is named a “transcription”. There are two-types of speech recognition systems: those that are speaker-dependent (trained and optimized to for specific speakers) and those that are speaker-independent (needing no training for specific speakers).
Speech recognition engines generally use language models Language models are probabilistic distributions on sequences of words. These models define the probability of the next word based on a sequence. Both speaker-dependent and speaker-independent systems can have language models. Some speech recognition software provide training tools to train the language model by supplying training data. These systems modify their pre-determined language model with new probabilities estimated from the additional training text supplied by the user of the software. For instance, a system can be packaged with a “US-English” language model, which captures the statistics of the generation of English in the general US population. While this language model is adequate to transcribe speech in English when no other information on the content to be converted is known, a specific speaker or group of people (for instance, people working for the same organization) may need a better optimized language model to reflect their particular use of the English language. For instance, technical words, people, products and model names, etc., are unlikely to be properly recognized by a general language model.
These systems also use dictionaries that define the set of word candidates. On certain systems, the dictionary can also be modified by the user of the speech recognition system.
Improvements are desired to make searching of voice files easier, faster and more accurate.