The detection of irregularities in data, e.g., visual data, such as images and video sequences, has many practical applications. The detection of suspicious behaviors or unusual objects, for example, is important for surveillance and monitoring. Identifying spatial saliency in images is useful for quality control and automatic visual inspection. Detecting behavioral saliency in video is useful for drawing the attention of a viewer to particular areas of interest in the video.
One of the main problems in automating the detection of irregularities is that the notion of “irregular” or “suspicious” is dependent upon a context-based definition of “regular” or “valid”. For example, in a library where fifty people are reading or browsing for books quietly, the behavior of one man cheering wildly is “irregular”. However, in the context of a football stadium, and hundreds of wildly cheering fans, it is the behavior of a person reading quietly in the stands which is irregular.
Thus, while a casual human observer would effortlessly draw the conclusions described hereinabove regarding the regularity of reading or cheering in two different situations, a serious impediment to performing the detection of irregularities in data by automatic means lies in the impossibility of explicitly defining all possible valid configurations for a given context. Attempts to overcome this impediment in the prior art have included a variety of approaches for the various applications of irregularity detection in images and video sequences.
Previous approaches to recognition of suspicious behaviors or activities by automatic means can broadly be classified into two classes of approaches: rule-based methods (as taught by Ivanov et al., “Recognition of multi-agent interaction in video surveillance”, ICCV, 1999) and statistical methods without predefined rules (as taught by Stauffer et al., “Leaning patterns of activity using real-time tracking”, PAMI, 2000 and Zhong et al., “Detecting unusual activity in video”, CVPR04). The statistical methods may be considered preferable since they do not assume a predefined set of rules for all valid configurations. Instead, they try to automatically learn the notion of regularity from the data, and thus infer what is suspicious. Nevertheless, the representations employed in previous methods have been either very restrictive (e.g., trajectories of moving objects in Stauffer et al.), or else too global (e.g., a single small descriptor vector for an entire frame in Zhong et al.).
Previous approaches for detecting image saliency (e.g., as taught by Itti et al., “A model of saliency-based visual attention for rapid scene analysis”, PAMI, 1998) proposed measuring the degree of dissimilarity between an image location and its immediate surrounding region. Thus, for example, image regions which exhibit large changes in contrast are detected as salient image regions. The definition of “visual attention” is derived from the same reasoning. However, the notion of saliency cannot be necessarily determined by the immediate surrounding image regions. For example, a single yellow spot on a black paper may be salient. However, if there are many yellow spots spread all over the black paper, then a single spot will no longer draw our attention, even though it still induces a large change in contrast relative to its surrounding vicinity.
It will be appreciated that for simplicity and clarity of illustration, elements shown in the figures have not necessarily been drawn to scale. For example, the dimensions of some of the elements may be exaggerated relative to other elements for clarity. Further, where considered appropriate, reference numerals may be repeated among the figures to indicate corresponding or analogous elements.