With the increased demand for security and safety, video-based surveillance systems are being employed in a variety of rural and urban locations. A vast amount of video footage, for example, can be collected and analyzed for traffic violations, accidents, crime, terrorism, vandalism, and other suspicious activities. Because manual analysis of such large volumes of data is prohibitively costly, a pressing need exists for developing effective software tools that can aid in the automatic or semi-automatic interpretation and analysis of video data for surveillance, law enforcement, and traffic control and management.
Video-based anomaly detection refers to the problem of identifying patterns in data that do not conform to expected behavior, and which may warrant special attention or action. The detection of anomalies in the transportation domain can include, for example, traffic violations, unsafe driver/pedestrian behavior, accidents, etc.
FIGS. 1-2 illustrate pictorial views of exemplary transportation related anomalies captured from, for example, video-monitoring cameras. In the scenario depicted in FIG. 1, unattended baggage 100 is shown and identified by a circle. In the example shown in FIG. 2, a vehicle is depicted approaching a pedestrian 130. Both the vehicle and pedestrian 130 are shown surrounded by a circle.
A number of anomalies can be generated by a typical trajectory/behavior of a single object and collective anomalies can be caused by joint observation of the objects. For example, in the area of transportation, accidents at traffic intersections are indeed based on joint and not just individual object behavior. Also, it is possible that the individual object behaviors are not anomalous when studied in isolation, but in combination produce an anomalous event. For example, a vehicle that comes to a stop at a pedestrian crossing before proceeding is a result of the car colliding with, or coming in very close proximity with the crossing pedestrian.
FIG. 3 illustrates a schematic view of trajectory classification utilizing a prior art sparse reconstruction model 100. In the example shown in FIG. 3, a first training class 120 is shown with respect to a second training class 140. A test trajectory 110 can be employed with respect to the training classes 120 and 140. Example data 130, 135 is shown in FIG. 3 with respect to the first training class 120.
Several approaches have been proposed to detect the traffic-related anomalies based on an object tracking technique. In one prior art approach such as shown in FIG. 3, nominal vehicle paths or trajectories can be derived and deviations thereof can be searched in a live traffic video data. The vehicle can be tracked and its path compared against nominal classes during a test or evaluation phase. A statistically significant deviation from all classes indicates an anomalous path.
Another approach involves the use of a sparse reconstruction model to solve the classification problem and subsequently for anomaly detection. For example, a normal and/or usual event in a video footage can be extracted and categorized into a set of nominal event classes in a training step. The categorization is based on a set of n-dimensional feature vectors extracted from the video data and can be performed manually or automatically. Any numerical feature descriptor can be used to encode the event. In transportation video, as mentioned earlier, object trajectories are often chosen as the feature descriptor. However, other descriptors such as spatiotemporal volume can also be used. The only hypothesis behind the sparsity model is that any new event that has been previously encountered in the training dictionary construction process can be explained as a sparse linear combination of samples within one of the nominal classes in the dictionary.
Specifically, the training samples from the i-th class of the dictionary can be arranged as columns of a matrix Aiε. A dictionary Aεwith respect to the training samples from all K classes can then be formed as follows: A=[A1, A2, . . . , AK]. A test image yε from a similar class is conjectured to approximately lie in a linear span of those training samples for given sufficient training samples from the m-th trajectory class. Any input trajectory feature vector may hence be represented by a sparse linear combination of the set of all training trajectory samples as shown below in equation (1):
                    y        =                              A            ⁢                                                  ⁢            α                    =                                    [                                                A                  1                                ,                                  A                  2                                ,                …                ⁢                                                                  ,                                  A                  K                                            ]                        ⁡                          [                                                                                          α                      1                                                                                                                                  α                      2                                                                                                            ⋮                                                                                                              α                      K                                                                                  ]                                                          (        1        )            where each αiε. Typically for a given trajectory y, only one of the αi's is active (corresponding to the class/event that y is generated from), thus the coefficient vector αε is modeled as being sparse and is recovered by solving the following optimization problem:
                              α          ^                =                                            argmin              α                        ⁢                                                  ⁢                                                          α                                            1                        ⁢                                                  ⁢            subject            ⁢                                                  ⁢            to            ⁢                                                  ⁢                                                                            y                  -                                      A                    ⁢                                                                                  ⁢                    α                                                                              2                                <          ɛ                                    (        2        )            where the objective is to minimize the number of non-zero elements in α. It is well-known from the compressed sensing literature that utilizing the I0 norm leads to a NP-hard (non-deterministic polynomial-time hard) problem. Thus, the I1 norm can be employed as an effective approximation. A residual error between the test trajectory and each class behavior pattern can be computed as shown in equation (3) to determine a class to which the test trajectory belongs:ri(y)=∥y−Ai{circumflex over (α)}i∥2i=1,2, . . . K  (3)
If anomalies have been predefined into their own class, then the classification task also accomplishes anomaly detection. Alternatively, if all training classes correspond to only normal events, then anomalies can be identified via outlier detection. To this end, an index of sparsity can be defined and utilized to measure the sparsity of the reconstructed α:
                              SCI          ⁡                      (            α            )                          =                                                            K                ·                                                      max                    i                                    ⁢                                                                                                                                                                  δ                            i                                                    ⁡                                                      (                            α                            )                                                                                                                      i                                        ⁢                                          /                                        ⁢                                                                                          α                                                                    i                                                                                  -              1                                      K              -              1                                ∈                      [                          0              ,              1                        ]                                              (        4        )            where δi(α):→ the characteristic function that selects the coefficients αi with respect to the i-th class. The normal samples are likely to exhibit a high level of sparsity, and conversely, anomalous samples likely produce a low sparsity index. A threshold on SCI(α) determines whether or not the sample is anomalous. Such a sparsity based framework for classification and anomaly detection is robust against various distortions, notably occlusion and is robust with respect to the particular features chosen, provided the sparse representation is computed correctly.
The aforementioned approach does not take into account joint anomalies involving multiple objects and also does not capture the interactions required to detect these types of multi-object anomalies. To address this issue, a joint sparsity model can be employed to detect anomalies involving co-occurrence of two or more events. The joint sparsity model solves for the sparse coefficients via the optimization problem. An example of the optimization problem and the joint sparsity model is discussed in U.S. patent application Ser. No. 13/476,239 entitled “Method and System for Automatically Detecting Multi-Object Anomalies Utilizing Joint Sparse Reconstruction Model,” which is incorporated herein by reference in its entirety. The optimization problem can be expressed as, for example:minimize∥J(H∘S)∥row,0 subject to ∥Y−AS∥F<ε  (5)
The aforementioned sparsity models have been shown to outperform many prior art anomaly detection techniques. However, these models use only one predefined event representation. It is not difficult to see that utilizing multiple diverse event feature representations could contain correlated yet complementary information about the same event, whether normal or anomalous. For example, a vehicle traversing a traffic intersection may be observed from three cameras with very different viewpoints resulting in a triplet of motion trajectories describing that event. Alternatively, one may define multiple feature descriptors for an event, such as vehicle trajectory and spatiotemporal volume to obtain a richer description of events.
Based on the foregoing, it is believed that a need exists for improved methods and systems for automatically detecting multi-object anomalies at a traffic intersection utilizing a simultaneous structured sparsity model, as will be described in greater detailed herein.