In the past, there has been considerable interest in detecting and classifying movement of individuals for time and motion studies, for security and for general monitoring purposes. The bulk of such monitoring systems require a priori knowledge so as to be able to detect preselected or predetermined behaviors. In these systems motion captured by a video camera is matched against histograms or other types of templates. However, all of these systems require some idea of normative behavior and the non-normative behavior sought to be detected in order to make the histograms or templates.
As a result, there is wide interest in learning normative models of activity from vision through the use of hidden Markov models. However their use results in unsatisfactory results due to exceedingly low accuracies.
Note that in an article entitled "Generation of semantic regions from image sequences," J. H. Fernyhough, A. G. Cohn, and D. C. Hogg have shown how to learn characteristic motion maps for pedestrian plazas, which are themselves representations of non-parametric distributions over collections of pedestrian trajectories. Their account can be found in European Conference on Computer Vision, 1996, volume 2, pages 475-484. Their models lack some of the important properties of the subject invention, notably concision and accurate recovery of the essential structure of the signal.
The literature of structure-learning in HMMs is, to date, based entirely on generate-and-test algorithms. These algorithms work by selecting a single state to be merged or split, then retraining the model to see if any advantage has been gained. Though these efforts use a variety of heuristic techniques and priors to avoid failures, much of the computation is squandered and reported run-times range from hours to days. Andreas Stolcke and Stephen Omohundro detail a merging algorithm in "Best-first model merging for hidden Markov model induction," International Commputer Science Institute Technical Report 94-003, University of California Berkeley, April 1994. Shiro Ikeda details a splitting algorithm in "Construction of Phoneme Models--Model Search of Hidden Markov Models," proceedings of the International workshop on Intelligent Signal Processing and Communication Systems, Sendai, October 1993.
Hidden Markov models are widely used for modeling and classifying signals. The Baum-Welch algorithm efficiently estimates maximum likelihood, ML, parameters, but the user is obliged to specify the graphical structure of the model. Typically the user makes several guesses at the state count and transition topology; testing each guess is computationally intensive.
The process is tedious but necessary. Since structure is the primary determinant of a model's selectivity and speed of computation.