Recently, a great deal of interest has been generated in automated surveillance systems that utilize a wide field-of-view (FOV) to detect anomalies in a panoramic view or scene. Such systems typically employ an attention algorithm (i.e., saliency) as a first pass that analyzes the entire scene and extracts regions of interest, which are typically advanced to some other system for analysis and/or identification. The attention algorithm is also often used as a seed to a “surprise” algorithm that finds salient regions of interest that change and, therefore, could be a potential anomaly in the FOV. The attention and surprise algorithms must be very accurate, as any event that is not detected by these algorithms cannot be analyzed by subsequent algorithms in the system. However, since attention/surprise must be processed on every pixel in the imagery from the wide FOV, the attention/surprise algorithm requires a large amount of computational resources and often requires hardware to run the algorithms. Depending on the hardware used to run the algorithm, the requirement may either exceed the available resources of the hardware or require larger and more complex hardware (e.g., multiple field-programmable gate arrays (FPGAs)) resulting in bulky and high-power systems.
While a number of researchers have shown interest in systems that compute the saliency of a scene, there is currently no prior art that specifically addresses low Size, Weight and Power (SWaP) processing of wide FOV imagery that employs multiple cameras. The system using existing prior art in the general area of attention would consist of the trivial solution of running all cameras in the FOV and processing the continuous stream of frames using a conventional feature or object-based attention algorithm. Examples of such algorithms include the feature-based work of Itti and Koch (see Literature Reference No. 3) and Navalpakkam and Itti (see Literature Reference Nos. 6 and 7) and the object-based work of Khosla and Huber (see U.S. patent application Ser. No. 12/214,259), Draper and Lionelle (see Literature Reference No. 1), Orabona et al. (see Literature Reference No. 8). With respect to saliency, these systems run a saliency algorithm on the frames in a video stream and return a given number of possible targets based on their saliency in that frame.
The pure surprise algorithms (both feature and object-based) are incomplete because they yield poor results when applied to video imagery of a natural scene. Artifacts from ambient lighting and weather often produce dynamic features that can throw off a saliency algorithm and cause it to think that “everything is salient”. Mathematically, it may be the case that everything in the scene is salient, but when a system is tasked with a specific purpose, such as surveillance, one is only interested in legitimate short-term anomalies that are likely to be targets. Therefore, simple saliency systems cannot provide the service that the current invention does.
An alternative approach to “pure surprise” is to use a full surprise algorithm. Full surprise algorithms employ a great deal of additional processing on the features in each frame of the video and create statistical models that describe the scene. If anything unexpected happens, the surprise algorithm is able to return the location of the happening. The closest known prior art to this invention is the surprise algorithm of Itti and Baldi (see Literature Reference No. 2). The work of Itti and Baldi employs a Bayesian framework and features that contribute to the saliency map to construct a prior distribution for the features in the scene. The current saliency map is used as the seed for a “posterior” distribution. This algorithm uses the KL distance between the prior and posterior as the measure of surprise. Because it takes the entire history of the scene into account, it exhibits a much lower false alarm rate than that of a system that exclusively uses saliency.
However, as one might expect from the description of the algorithm, the Itti and Baldi surprise algorithm is very complicated and computationally expensive. It was designed to run on very high-end computer hardware, and even then cannot currently run at real-time on high-resolution video imagery. The computer hardware it runs on is very bulky and power-consuming, which prevents its use on a mobile platform. Furthermore, the complexity of the algorithm largely prevents it from being ported to low-power hardware, which is essential for deployment on a mobile platform.
In addition to the above, there are a plethora of non-saliency based methods that model the background and then use changes in this model to detect “change” regions. Such methods fail and perform poorly since as they are not using saliency as a basis of attention. Previous work by D. Khosla, C. Moore, D. Huber, and S. Chelian on object-based attention and recognition has clearly shown that finding regions of interest via attention has very high detection and low false alarm rates and performs better than simple “change” detection methods (see Literature Reference No. 5).
Previous attempts to make the basic saliency algorithm more robust have only resulted in a larger computational burden. The transition from feature-based to object-based saliency algorithms increased the computational load on the system without any dramatic improvement in detection and false alarm rate. To accomplish any type of solid improvement in detection and false alarm rate, systems that employ other surprise algorithms were developed; these are very complicated and require large amounts of computer hardware in order to operate in real-time. Since real-time operation is critical to any surveillance operation, this requirement cannot be scaled back. Furthermore, the complexity of competing surprise algorithms makes it difficult to port to low-power hardware for implementation on a mobile surveillance platform.
Thus a continuing need exists for a system for identifying regions of interest in wide FOV imagery by finding “surprising” or changing saliency regions, with such a system being operable in real-time with a reduced computational cost compared to the prior art.