Surveillance systems typically employ video cameras or other sensors to collect spatio-temporal data. In the simplest systems, that data is displayed for contemporaneous screening by security personnel and/or recorded for later reference after a security breach. In those systems, the task of detecting objects of interest is performed by a human observer. A significant advance occurs when the system itself is able to perform object detection itself, either partially or completely.
In a typical outdoor surveillance system, for example, one may be interested in detecting objects such as humans, vehicles, animals, etc., that move through the environment. Different objects might pose different threats or levels of alarm. For example, an animal in the scene might be perfectly normal, but a human or a vehicle in the scene might be cause for an alarm and might require the immediate attention of a security guard. In addition to legitimate activity in the scene (humans, vehicles, animals, etc.), the environment might be susceptible to significant lighting changes, background motion such as moving foliage and camera jitter due to a strong wind. An effective object-of-interest detection algorithm should be able to discern objects of interest in a highly dynamic environment in a timely fashion. A powerful surveillance system ideally has the capability to detect objects of interest while scoring high on the following benchmarks: (1) accuracy; (2) computational efficiency; and (3) flexibility. An accurate system detects objects of interest with high probability while achieving a very low false alarm rate. The computational workload should be manageable enough to provide its detection results and raise alarms while not missing any relevant activity in the environment and guiding a human operator's attention to the activity as it is happening. A flexible system can handle multiple input modalities and multiple operating conditions seamlessly.
There are presently a variety of algorithms and strategies for automatic object detection in a spatio-temporal signal. Most of those detection methods cater to a subset of the operating conditions underlying the intended applications of the algorithm. It is known, for example, to employ a classification mechanism that maps a given spatial region of a spatio-temporal signal to one of a finite set of object types. Algorithms differ in how they process or search the spatio-temporal signal prior to the classification stage. Some algorithms employ a focus-of-attention mechanism to limit the extent of the search to less than the signal's full spatial extent.
A focus-of-attention mechanism that identifies possible regions that could contain an object of interest is often referred to as foreground/background separation. Foreground/background separation in prior art fundamentally relies on some notion of an outlier. This notion is typically quantified in terms of some probability threshold. Almost all the existing algorithms either rely on outlier metrics completely (memory-based algorithms) or rely on conceptual classification mechanisms completely (memory-less algorithms). The latter tend to be computationally overwhelming and not suitable for real time algorithms. The former tend to be fast, but not as robust. Using sophisticated models for outlier detection such as multi-modal distributions, accounts for the dynamic and periodic nature of the background component, but it does not explicitly account for the statistics of the foreground component. Furthermore, outlier-based techniques are not sensitive to subtle differences in the chromatic signatures of objects of interest and that of the environment. Overall, then, existing techniques suffer from one or both of (1) a lack of performance, i.e., high false positives and negatives; and (2) a high computational cost.