Human-machine interfaces are subject to variable amounts of error and uncertainty. The application of a post-processing or correction algorithm is therefore critical. The excellent performance shown by humans when we interpret a spoken, gestured, typed, handwritten or otherwise transmitted message is mostly due to our error-recovery ability, due to the lexical, syntactic, semantic, pragmatic, and discursive language constraints humans apply.
Among the different levels at which language can be modeled, the lowest one is the word level, involving lexical constraints on the sequence of characters inside each word. The next one is the sentence level, which takes into account syntactic and semantic constraints on the sequence of words or word categories inside a sentence (or a field, for instance, in a form-processing application). Word and sentence level models typically apply dictionary search methods, n-grams, Edit Distance-based techniques, Hidden Markov Models, and other character or word category transition models. The higher levels consider a wider context and require specific a priori knowledge of the application domain.
The goal of a symbol-input post-processing method is to optimize the likelihood that the strings received as input hypotheses are correct, in the sense that they are compatible with the constraints imposed by the task (language). These constraints conform the language model and can be as simple as a small set of valid words (e.g. the possible values of the “country” field in a form) or as complex as an unconstrained sentence in a natural language.
In practice, the simplest method to handle correction is to use a lexicon to validate the known words and ask an operator to verify or input by hand the unknown words. Specific techniques can be used to carry out approximate search in the lexicon.
Other methods are based on n-grams or on finite-state machines, where a candidate string is parsed and the set of transitions with the lowest cost (highest probability) defines the output string. The classical algorithm, widely used in different fields, to find the maximum likelihood path on a finite-state machine and to perform error-correcting parsing on a regular grammar is the Viterbi Algorithm.