(1) Field of Invention
The present invention relates to an anomaly detection system and, more particularly, to a system for detecting anomalies in a series of images by optimizing rapid serial visual presentation (RSVP) from user-specific neural brain signals.
(2) Description of Related Art
Anomaly detection systems can be used to identify anomalies, or patterns that differ from an established normal behavior, in sets of data. Several techniques exist for identifying anomalies within a dataset. One such technique involves measuring the brain activity of a user monitoring a series of images for anomalies, known as rapid serial visual presentation (RSVP). RSVP measures the brain activity of a human subject while watching a stream of rapid images in order to find incongruities and inconsistencies in the images (i.e., “targets”). The RSVP protocol has recently been used as a powerful tool for high-throughput filtering of images into simple “target” and “non-target” categories as described by Thorpe et al. in “Speed of Processing in the Human Visual System” in Nature, vol. 381, pp. 520-522, 1996 (hereinafter referred to as the Thorpe reference), which is hereby incorporated by reference as though fully set forth herein. This involves displaying a series of small images (e.g., at 256-by-256 pixel resolution), called “chips” to a human subject at a very high frame rate (e.g., 10 Hertz) and measuring the electrical activity of the subject's brain using electroencephalograph (EEG) technology. Image transitions of high contrast can induce false alarm signals in the subject's brain, reducing the effectiveness of the experiment.
During a RSVP experiment, the images presented to the human subject are randomized. While this is often acceptable when presenting a subject with a sequence of images taken from satellite imagery, this poses problems when land-based imagery is employed. Artifacts of depth, such as lighting, scale, and texture changes, as well as topography variations (e.g., ground versus sky) provide a great deal more image variance, which leads to false positives in recording of neural brain signals (i.e., electroencephalography, or EEG) of the subject as the result of high contrasts in the features of quickly-presented image chips that cause undesired “surprise” EEG signals. A surprise EEG signal occurs when two contrasting non-target images are placed in immediate succession to one another.
Prior art exists to transform the images in a RSVP set to be similar to each other across certain perceptual factors. For example, images can be nonlinearly corrected via gamma transform to match their mean luminance in order to minimize the “jarring” effect as described by Gerson et al. in “Cortically Coupled Computer Vision for Rapid Image Search” in IEEE Transactions on Neural Systems and Rehabilitation Engineering, 14(2): 174-179, 2006 (hereinafter referred to as the Gerson reference), which is hereby incorporated by reference as though fully set forth herein. However, these methods exhibit limited success, and the image sequence presented by RSVP is still highly “jarring” to the user.
The next-best solution to this problem is in the field of content-based image retrieval (CBIR) which permits image searching based on features automatically extracted from the images themselves as described by Smeulders et al. in “Content-Based Image Retrieval at the End of the Early Years” in IEEE Transactions on PAMI. 22(12): 1349-1380, 2000 (hereinafter referred to as the Smeulders reference), which is hereby incorporated by reference as though fully set forth herein. The CBIR field has been motivated by the need to efficiently manage large image databases and run image retrievals without exhaustive searches of the image archive each time. The system compares the features of the selected image with the characteristics of the other images in the set and returns the most similar images. Typically, this is done by computing, for each image, a vector containing the values of a number of attributes and computing the distance between image feature vectors. Many different features and combinations have been used in CBIR systems. Color retrieval yields the best results, in that the computer results of color similarity are similar to those derived by a human visual system as described by Rogowitz et al., in “Perceptual Image Similarity” in Proceedings of Society of Photo-Optical Instrumentation Engineers (SPIE), 3299: 576-590, 1998 (hereinafter referred to as the Rogowitz reference), which is hereby incorporated by reference as though fully set forth herein. Features include texture, shape, and bio-inspired features, for example. The best image matches are typically returned and displayed to the user in descending order of the computed distance.
While CBIR could be naively applied to image ordering for the problem of EEG experimentation using RSVP, this would pose a number of difficulties that would make it inferior. For a block of images to be ordered for RSVP, one could determine the feature set of each and load them into the CBIR database. Starting from an arbitrary image, one could find the closest match, then the closest match to that image (the match), and so on, until all images have been queued. This procedure is equivalent to using the “nearest neighbor” heuristic for solving the travelling salesman problem (TSP), an NP-complete problem in combinatorial optimization. However, this algorithm does not guarantee the optimal result, and can actually provide the least optimal result depending on the dataset and the first image selected as described by Gutin et al. in “Traveling Salesman Should Not be Greedy: Domination Analysis of Greedy-Type Heuristics for the TSP in Discrete Applied Mathematics, 117: 81-86, 2002 (hereinafter referred to as the Gutin reference), which is hereby incorporated by reference as though fully set forth herein.
The prior art for user relevance feedback (i.e., supervised learning) in CBIR systems primarily focuses on whether the images returned by the algorithm are similar to a seed image as presented by Morrison et al. in “Semantic Clustering of Images Using Patterns of Relevance Feedback” in Proceedings of the 6th International Workshop on Content-based Multimedia Indexing (CBMI 2008), London, UK, 2008 (hereinafter referred to as the Morrison reference), which is hereby incorporated by reference as though fully set forth herein. This involves running the computer algorithm to find a candidate match for an image, and then allowing the user to answer as affirmative or negative regarding the similarity of the image. CBIR systems do not address the issue of image sequencing or determining the relative similarity of images that may, in fact, be very similar to one another. The CBIR prior art has no notion of ordering of the images. Each of the prior methods discussed above exhibit limitations that make them incomplete. This is because they generally do not directly address the problem of ordering images specifically for the RSVP method and consequently produce results that are unacceptable for the application.
In addition to optimizing RSVP for image ordering, the technique can also be used to optimize search and detection performance for items of interest (IOI) in images (static RSVP) and videos (video RSVP). Prior art exists which describes bio-inspired visual attention mechanisms for static RSVP. The first is a system that computes pure saliency on the frames of a video stream and reports possible targets based on those results. Systems using feature-based saliency have been proposed by lid and Koch in “A saliency-based search mechanism for overt and covert shifts of visual attention” in Vision Research, 40: 1489-1506, 2000, and Navalpakkam and Itti in both “Modeling the Influence of Task on Attention” in Vision Research, 45: 205-231, 2005 and “An integrated model of top-down and bottom-up attention for optimal object detection” in Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1-7, 2006.
Secondly, object-based approaches have been proposed by Khosla et al. in “Bio-Inspired Visual Attention and Object Recognition” in Proc. SPIE Defense, Security, and Sensing, 6560, 656003, 2007, Draper and Lionelle in “Evaluation of Selective Attention under Similarity Transforms in Workshop on Performance and Attention in Computer Vision, Graz, Austria, April 2003, and Orabona et al. in “Object-based Visual Attention: A Model for a Behaving Robot” in 3rd International Workshop on Attention and Performance in Computational Vision (in CVPR 2005), San Diego, Calif., 2005. These systems run a saliency algorithm on the frames in a video stream and return a given number of possible targets based on their saliency in that frame. The pure surprise algorithms (both feature- and object-based) can yield poor results when applied to video imagery of a natural scene. Artifacts from ambient lighting and weather often produce dynamic features that can throw off a saliency algorithm and cause it to think that “everything is salient”. Mathematically, it may be the case that everything in the scene is salient. However, when a system is tasked with a specific purpose, such as surveillance, one is only interested in legitimate short-term anomalies that are likely to be targets. Therefore, simple saliency systems cannot provide the service that the current invention does.
The alternative approach is to use a full surprise algorithm. These algorithms employ a great deal of additional processing on the features in each frame of the video and create statistical models that describe the scene. If anything unexpected happens, the surprise algorithm is able to return the location of the happening. The closest known prior art is the surprise algorithm of Itti and Baldi in “Bayesian Surprise Attracts Human Attention” in Vision Research 49: 1295-1306, 2008. This work employs a Bayesian framework and features that contribute to the saliency map to construct a prior distribution for the features in the scene. The current saliency map is used as the seed for a “posterior” distribution. The algorithm uses the KL distance between the prior and posterior as the measure of surprise. Because it takes the entire history of the scene into account, it exhibits a much lower false alarm rate than that of a system that exclusively uses saliency. However, as one might expect from the description of the algorithm, the Itti and Baldi surprise algorithm is very complicated and computationally expensive. It was designed to run on very high-end computer hardware and, even then, cannot currently run at real-time on high-resolution video imagery. The computer hardware it runs on is very bulky and power-consuming, which prevents its use on a mobile platform. Furthermore, the complexity of the algorithm largely prevents it from being ported to low-power hardware, which is essential for deployment on a mobile platform. In addition to the above, there are a plethora of non-saliency based methods that model the background and then use changes in this model to detect “change” regions. The prior art cited above are hereby incorporated by reference as though fully set forth herein.
Thus, a continuing need exists for an automated system for optimizing RSVP that addresses the issue of image sequencing or determining the relative similarity of images that may be very similar to one another based on user feedback. Additionally, a need exists for a system for optimizing search and detection performance for IOI in videos that use RSVP-based EEGs.