Generally, speech recognition for dictation works in two steps. First, speech recognition converts a speech audio signal into a verbatim transcript, which is a sequence of words (tokens). Second, formatting renders the tokens to form a written document. Examples of formatting include:                spacing: How many spaces before/after punctuation?        numbers: “twenty three”→“23” or “twenty-three” or “XXIII”        spelling variation: “dialing” or “dialling”        abbreviations: “Incorporated” or “Inc.”        units of measure “ten miles” or “10 mi.”        dates: “October thirteenth two thousand nine”→“Oct 13, 2009”, or “10/13/09”, or “2009-10-13”, . . . .        addresses: “one wayside road Burlington Massachusetts”→“1 Wayside Rd. Burlington, MA”        
Typically, the user can customize the formatted output by setting various options. These options may be set by the user using, for example, a Graphical User Interface (GUI). FIG. 1 shows exemplary screen shots of a GUI used for setting formatting options. The formatter itself is typically based on a context-free grammar that recognizes formatted expressions such as number and dates. The productions of the grammar are associated with executable code that generates the formatted output taking into account the options settings.
If the system output does not match user expectations, users correct the recognized and formatted text. Whether the error is in recognition or in formatting, users do not know. Recognition improves from user feedback, since language and acoustic models are adapted on data provided by the user and on corrections. Formatting, on the other hand, completely ignores user data and corrections. Since non-technical users hardly explore the GUI to change formatting option settings, they keep correcting the same formatting errors repeatedly, leading to user frustration.