A user of a computer inputs or types in one or more keywords as the input to a search engine as a commonly, used method for accessing indexed content stored in a database. But, in numerous situations, typing is not the most convenient and effective means to input a query, while voice provides a natural and efficient interface. Speech recognition or detection software must recognize accurately the spoken words and perform the query indicated by the spoken input, or alternatively find a list of the closest words that match acoustically the input (a list of best candidate words, or N-best list). The user can then pick the proper word from the list and search the database using the selected word without typing more words.
Most speech recognition or detection systems use, more or less successfully, dialog based systems. The user is prompted to answer a series of questions. The dialog is usually modeled with finite state grammars. Because of their deterministic nature, finite state grammars are fairly rigid to use. Such a traditional system has to be designed such that every word and sentence expected to be spoken by the user are stored in the system.
Methods for computing the N-best lists are widely used in speech recognition and detection systems. These traditional methods usually decode the speech input and generate the N-best list of best candidates (words or sentences) using several basic sources of knowledge, such as word pronunciations, language models, or statistical grammars. The N-best list may then be re-ordered using additional sources of information, such as natural language processing.