Audio based speech recognition services such as, for example, Dragon Dictation, Ski, and SILVIA, can transcribe audio signals including voice data representing speech into text to be rendered on a display. On the other hand, image based speech recognition services transcribe speech into words by, for example, recognizing lip motion. In one such approach, a local binary pattern (LBP) of a series of images in video of lip motion is recognized as text by comparison to a database. However, extracting the LBP from a video can consume a great amount of processing and memory resources. Both types of speech recognition services will be referred to here as speech-to-text services.