Many problems in information extraction can be reduced to segmenting/labeling sequences, including part-of-speech tagging (in natural language applications), phoneme tagging (in speech applications), and sequence alignment (in bioinformatics applications). Hidden state Markov Models are widely used for solving such problems. A hidden state corresponds to a label for each observation in an input sequence, and the Markov assumption specifies that the state corresponding to time step (or location) n is independent of the state corresponding to time steps prior to n−2 given the state of time step n−1. Two such models are linear chain Hidden Markov Models (HMMs) and Conditional Random Fields (CRFs). Both models have been widely used for solving many problems dealing with semi-structured input sequences due to their simplicity and effectiveness.
Traditionally, the Viterbi algorithm is used for decoding such models. This algorithm requires computing a forward pass over the input sequence to compute probabilities/scores, followed by a reverse pass to compute the optimal state/label sequence. Therefore, all the data must be seen before any of the hidden states can be inferred, and hence it cannot be directly applied to real-time/reactive applications, or to applications where there are strong latency and/or memory constraints.
Thus, models and/or classifiers for labeling sequence data are typically based on local information (in which case they are fast, but not very accurate) or based on global information (in which case they are more accurate, but have higher latency/memory requirements). Consider, for example, a software application assistant which tries to determine user intent based on a sequence of user actions. One method guesses what the user is trying to do based on a current user action, while another method waits for the user to finish doing what they are trying to do, and then guesses the user's intention based on the entire sequence. This produces either fast, inaccurate results or slow, high-cost, accurate results that also require knowledge of a complete set of sequence data. Online applications such as those found on the Internet and/or intranets generally require fast and highly accurate results to entice users to use their services. When functions cannot provide these types of characteristics, they are often left out of applications to avoid user dissatisfaction, leaving the applications with less than desired functionality.