Anomalous event detection from video data is a complex problem that has been the subject of investigation over the past decade. The number of surveillance cameras in the public domain is large, and as a result it is practically often impossible to manually monitor the video streams from all the cameras. As a result there is a pressing need for automatic, or substantially automatic, and/or scalable detection of abnormal events from video streams. There is also a pressing need to for being able to do this in substantially real time, in real-time, or in an acceptable time period.
The literature on abnormal event detection can be generally classified into two main streams: model-based and statistical-based approaches.
In the model-based approach, image features associated with behaviours such as the trajectory or shape of moving objects are extracted from video data. Then, typically a supervised learning step is performed to learn the behaviour model given the observed features. This model can then be used at the inference stage to decide whether the current scene is abnormal. Parametric and nonparametric spatio-temporal template methods are proposed and describe a boosting-based classifier for the inference step. A large body of model-based methods also consider the trajectories of moving objects as image features for behaviour analysis.
The second approach is based on statistical analysis of the whole scene rather than focusing on individual targets. This approach makes use of low-level information about the images in conjugation with statistical methods for event detection. The main advantages of this approach are simplicity, since low-level features can be easily computed, and reliability, since their computation is invariant to environmental constraints. As a result, statistical methods may be more robust and reliable, and can be deployed in real-world applications.
Examples of low-level image features are SIFT, interest points, salient and region detectors. The learning methods for low-level image features include the use of support vector machines (SVM), probabilistic latent semantic analysis (PLSA) and kernel learning. For example, using an ensemble of image patches to learn the irregularities in the video; modelling exemplar activities in the video using principal component analysis; and using volumetric feature based optical flow representation for activity recognition in video.
The existing literature on the statistical-based approach has shown some success even in real-world scenarios. However these methods are restricted to abnormal events associated with the main characteristics of the scene, which also may be call principal behaviour. For example, constructing the prototype-segment co-occurrence matrix, whilst looking for significant aggregate changes from pre-defined multiple locations.
One of the challenging problems for practical public surveillance interest is the detection of anomalous behaviour from one or few individuals in the presence of a large number of people behaving normally. Current statistical-based methods are unable to detect these types of anomalies. For example, current methods cannot detect a loitering person in a crowd.
In addition, the problem of detecting anomalies in data streams captured by large-scale sensor networks has received much interest over the past decade. As large-scale networks become prevalent, there is an increasing need to develop approaches that can address the challenges arising from the collection of large amounts of data. The problem affects a wide range of applications as the data captured by sensor networks could constitute multimedia content from the web, video from surveillance camera networks, satellite imagery or typical network traffic.
There have been several approaches to detect different types of anomalies developed for either databases or data streams in the past years, however these techniques generally share an important assumption that the complete data is available.
As network size increases, it becomes increasingly difficult to acquire all the data streams for processing. Hence, in large-scale networks, the complete data may not be always available at the fusion point for detecting anomalies because of either low bandwidth or large geometrical distances between sensors.
A number of proposals have been introduced to address the challenges in acquiring information about networks to circumvent the physical bandwidth constraints. One notable approach is decentralization. For example, one proposed approach is a decentralization method for streaming data in which the sensors only send information to the fusion point if the observed value falls outside the normal range, which is a typically pre-defined window. If a sensor does not send any data, the fusion point will assume a nominal value.
Another technique is column sampling, which is typically only suitable for static database applications. In this selective sampling approach, an empirical distribution over the columns of is constructed and a small number of columns are selected based on sampling from the empirical distribution.
Another concern for sensor network applications is the scalability of the computational framework that involves answering the queries with low latency. The computational complexity for processing the data stream is a function of length L and the dimension N of input data. In most cases, the computational complexity is quadratic with either L or N. In some cases, it can be linear with L, but this requires iterative estimation, for example, expectation maximization (EM) to reach convergence. The EM method also linearly scales with the sample size, but on average it has very slow convergence rates and is typically not suitable for anomaly detection in high speed data streams.
The anomaly detection methods mentioned above do not address the issue of high dimensionality of input data, the exception being spectral approaches. The assumption behind the spectral approaches has been motivated by the low-dimensional intrinsic structure of the input data. It should be noted that certain spectral methods may work well for small scale problems when the full dataset is available but are not applicable for large-scale problems when the full data matrix is not available.