Advances in speech processing technology have led to improved speech recognition performance, which, in turn, has enabled wide spread use of speech recognition in applications that run on multiple platforms. Speech recognition systems convert input audio, including speech, to recognized text. During recognition, audio data is typically divided into a sequence of discrete time vectors (e.g. 10 ms segments) called “frames.” This sequence of frames is converted into a sequence of words by a decoding process that selects and aligns statistical models of possible word acoustics with these input frames. These statistical word models typically are composed of sub-word unit models (e.g. phoneme or syllable models). Each sub-word unit model consumes one or more frames of audio data.