Typically, when conducting a search to automatically retrieve spoken documents from a large repository of audio files, a user enters either a text or spoken query. The most common way to retrieve the spoken documents is to use speech recognition software to transcribe all the audio files in the repository. Once the audio files have been converted to text, standard algorithms for indexing the corresponding textual documents are applied. In response to a subsequent search query, the indexed documents are searched and the most relevant documents are returned to the user.
Over the years, multiple techniques have been developed for combating recognition error and “out of vocabulary” (OOV) problems associated with indexing spoken documents from transcribed audio files. One technique for combating recognition errors is to index multiple hypotheses, or N-best lists, to recover deleted or substituted query terms.
To combat OOV word problems, phoneme recognition can be used rather than word recognition. A phoneme is the smallest phonetic unit in a language that is capable of conveying a distinction in meaning, as the m of mat and the b of bat in English. The transcribed audio files and search query term(s) are converted into phonemes rather than words. This may be accomplished by first generating a word transcription using speech recognition and then for each word, looking up the phoneme pronunciation in a dictionary or generating the phoneme pronunciation using a rule based algorithm. Alternatively, phoneme recognition can be used to directly convert audio to phoneme transcriptions. Several hypotheses of phonemes may be generated for each audio segment.
Phoneme indexing techniques may be improved by: indexing sequences of phonemes; using phonetic confusion matrices to expand the search query and the document representation; combining word and phoneme models.
Word searching and retrieval are also used in other settings, for example spell checking. Typically, a user types characters on a keyboard to create an electronic document. The document may contain a number of spelling errors. To eliminate these spelling errors, a spell correction program compares the words in the document with words in a dictionary. If the word in the document does not correspond to one of the words in the dictionary, an indication is provided to the user that the word may be misspelled. Further, spell correction programs may provide a list of suggested words to replace the misspelled word.
Normal spell correction algorithms are based on how a user confuses one character with another while typing. The algorithm checks character closeness based on a typical “QWERTY” keyboard. The algorithm tries to generate an in-dictionary word by replacing, adding, deleting or transposing one character with another. This is typically done using an edit distance to measure the distance from each of these new hypothesis and the entered word or using lists of common mistakes.