A search engine is a system that retrieves information from a database. Here, a database can be any type of repository containing electronic documents, for instance: the Web, mailing archives, file repositories, etc. Documents can contain text, images, audio and video data. Most search engines only index the textual part of documents.
A speech recognition engine automatically converts spoken words from an audio stream into computer text. The result of the operation is named a “transcription”. There are two-types of speech recognition systems: those that are speaker-dependent (trained and optimized to capture the speech of a specific speaker) and those that are speaker-independent (needing no training for a specific speaker).
Speech recognition engines generally use language models. Language models are probabilistic distributions on sequences of words. These models capture the probability of the next word in a sequence. Both speaker-dependent and speaker-independent systems may have language models. Some speech recognition software can have their language model trained using training text. These systems modify their pre-determined language model with new probabilities estimated from the additional training text supplied by the user of the software. For instance, a system can be packaged with a “U.S.-English” language model, which captures the statistics of the generation of English in the general U.S. population.
These systems also use dictionaries that define the set of word candidates. On some systems, the dictionary can also be modified by the user of the speech recognition system.
The modification of the dictionary and the training of the language model allow a user to specifically optimize the speech recognition engine for a specific domain. For instance, a call center using a speech recognition system to archive and analyze customer requests may want to optimize the language model to reflect the greater use of terms related to its product line in order to optimize the accuracy of the transcription.