Speech processing systems analyze audio streams and can produce outputs such as a transcription or lattice indicating occurrences of phonemes, words, or phrases in the audio stream. Transcriptions are generally linear sequences of units (e.g., words or phonemes). A lattice generally indicates alternative units, each unit spanning an associated interval, allowing alternative transcription hypotheses to be determined from the lattice. Systems often generate multiple transcription hypotheses (e.g., for each word or phrase spoken in the audio stream), usually with some level of confidence attached to each hypothesis.
Speech processing systems generally have a closed set of units, such as a closed word vocabulary or phoneme set. Generally, a speech processing system processes an input and produces a transcript or lattice based on the input in terms of the units in the closed set known to the system. However, in practice, new words, acronyms, names, and other terms are generated in language continuously, and if these terms are not added to the vocabulary of the system, they will not be identified by the speech processing system even if they are spoken in the audio stream. The vocabulary of a speech processing system may also be limited and may not include words specific to a particular field or application (e.g., product names or technical terms). These field-specific words will also not be identified by a speech processing system unless its lexicon is augmented to include such terms. To search for new terms not originally present in the lexicon, the audio stream can be reprocessed by the speech processing system with the new terms added to the vocabulary of the system. As speech processing is a slow and computationally expensive procedure, this process is generally impractical.