The embodiments described herein relate generally to data analysis, and more particularly, to analyzing time series data by modeling transitional patterns between events.
In recent years, installations of large camera networks and wide availability of digital video cameras have generated large volumes of video data that may be processed and analyzed to retrieve useful information. As many videos involve human activity and behavior, a central task in video analytics is to effectively and efficiently extract complex and highly varying human-centric events from the videos. Event recognition processes are designed to achieve two goals: (i) localization of temporal segments in a video containing salient events (i.e., determining when something happened), and (ii) classification of the localized events into relevant categories (i.e., determining what happened). Further analysis may be conducted on the extracted events. For example, suspicious behavior in video surveillance may be identified.
At least some known video event analysis systems treat event localization and classification as separate problems. However, these two problems are interrelated. Specifically, better event localization improves subsequent classification, and reliable event classification may be used to achieve more precise localization. Methods for unifying localization and classification problems may be organized into two categories: (i) generative approaches that use dynamic Bayesian models (such as the hidden Markov model and switching linear dynamical systems), and (ii) discriminative approaches that use max margin classifiers.
At least some known video event analysis systems only consider monolithic or persistent events. For example, a system may focus on the identification of action states, such as walking or with arms folded. Such methods ignore regular transitional patterns that often occur between events of interest. For example, if a person starts with their arms positioned down in a resting position, and ends touching their nose, a transitional pattern occurs between, in which the arms move upward. Although an independent detection of such transitional patterns may be difficult using generative or discriminative approaches, the consecutive motion flow between action states is unique and recognizable, and may provide more reliable cues to localize and classify persistent events. However, at least some known video event analysis systems ignore or are unable to detect such transitional patterns.