Assigning labels to sequential data is a common problem in applications such as speech recognition, computational linguistic, computer vision, and robotics. For example, analyzing parts of speech, the task is to tag a sequence of words by considering the grammatical structure of the language, e.g., verb-verb-noun-noun-verb-adjective is a very unlikely grammatical sequence in English, while noun-verb-adverb is not. Similarly, in speech recognition, words or phonemes obey certain pronunciation rules of the underlying language, particularly as to their sequential order.
Likewise, one can assign letters and numbers to a sequence of hand-written characters by exploiting the structure enforced by the grammar of the underlying language. In these examples, sequential patterns are important and can be exploited to extract information from large data sets.
Two common model for solving such problems are hidden Markov models (HMMs), and conditional random fields (CRFs). Although these models are very powerful, different types of data require modifications specific to applications, resulting in various extensions of the models.
For example, a semi-Markovian CRF is a more general solution for a segmentation problem allowing non-Markovian state transitions in segments of data and assigning labels directly to the sequences, instead individual samples.
Another method describes non-parametric prior probabilities for systems with state persistence to prevent unrealistic state transitions. That method provides state persistence, and also allows training the transition probabilities in an infinite state space. In the above examples, the basic task of the final state sequence inference is to estimate a precise state sequence.
However, in many applications, that is not a necessary goal. Instead, the goal is to estimate some deterministic function of the state sequence. Particularly, the goal is to track the state transitions without accounting for the dwell times in each state.
In an example application, the movement of a person is tracked when exact transitions between states such as “sitting (s),” “jumping (j),” “walking (w),” and “running (r),” are ambiguous and not important, but the detection of unique sequence of states that occurred in a specified order is important.
For example, a ground truth example state sequence of human movement is y={s, s, j, j, j, w, w, r, r}, and an input sequence of data is x={x1, x2, . . . , x9}. The goal is to accurately predict the output of a deterministic function compress, where compress(y)={s, j, w, r}. That is, duplicate states are removed in the compressed sequence.
Moreover, when a predicted state sequence y′={s, s, j, j, j, j, w, r, r} is acquired by converting the first occurrence of the state ‘w’ to ‘j’ exactly at the transition from j to w, it is an error for conventional applications, but it is not an error for in an application with compressed state sequence inference, because compress(y)=compress(y′).
Inversely, when a predicted sequence is y″={s, s, j, j, w, j, w, r, r}, it is a fatal error for this application, even though it is only one state different from y. Here, state transition ambiguity is not the only feature of the problem, but the length of the compressed output is unknown and arbitrary, e.g., it is unclear how many unique actions occurred in the order of appearance during the movement of human.
There are several other problems that require such a special treatment in state sequence inference, including state counting processes in which one is interested in counting unique states in a sequence without considering the dwell times. To the best of our knowledge, this is a problem largely unaddressed in machine learning applications.
Compressed sequences have been described for a video-interpretation application. Exact state transitions are ambiguous and only distinct states are important for the video-interpretation. That method is only applicable to a very limited domain and probabilistic models can not be used due to a very high number of states.
In another video interpretation application, a simple transition-cost model is, used, wherein a state transition to the same state is assumed to have no cost, whereas all other possible transitions are assumed to have the same cost K. This is very similar to train a probabilistic sequential model that has zero weight for all transition to the same state, and same cost K as the weight for all other transitions, which is completely unrealistic in many applications, because sequential modeling of state transitions is destroyed.
FIG. 1 shows a conventional compressed state sequence inference method. Given the input sequence of data x={x1, x2, . . . , xT} 101 and previously trained HMM/CRF parameters {λj, μk} 106, a HMM/CRF decoding method 102 predicts a complete state sequence y={y1, y2, . . . , yT} 103 that corresponds to the data 101. Here, the increments of the index T are time steps. Then, a deterministic compress function ƒ 104 is applied to the complete state sequence 103 to determine a compressed sequence of unique states s=ƒ(y)={s1, s2, . . . , sc} 105. In the compressed state sequence, all duplicate states in the complete state sequence 103 are removed.