Speech recognition systems, or automatic speech recognizers (ASRs), have become increasingly important as more and more computer-based devices use speech recognition to receive commands from a user in order to perform some action as well as to convert speech into text for dictation applications or even hold conversations with a user where information is exchanged in one or both directions. Such systems may be speaker-dependent, where the system is trained by having the user repeat words, or speaker-independent where anyone may provide immediately recognized words. Some systems also may be configured to understand a fixed set of single word commands, such as for operating a mobile phone that understands the terms call or answer, or for simple data entry phone calls for example. Other ASRs use a natural language understanding (NLU) module that understands grammar and definitions of words to recognize a word from the context of the utterance (the words or sentences that were spoken) for more complex conversations or exchanges of information. For integrating an Automatic Speech Recognizer (ASR) with a Natural Language Understanding (NLU) module in a conversational system, confidence measures and/or alternative results are often required. One popular way to generate this data is to create a word lattice, i.e., a network of likely word hypotheses. The generation of word lattices, however, may slow down the speech recognition process resulting in a relatively inefficient process.
Also, word lattices are often built in a second step from state or phone lattices which were generated on the fly during speech decoding. Since state and phone lattices can become relatively large, and usually significantly larger than word lattices, this approach requires a large amount of RAM. A more efficient system is desirable.