With the increased demand for security and safety, video-based surveillance systems are being utilized in a variety of rural and urban locations. A vast amount of video footage, for example, can be collected and analyzed for traffic violations, accidents, crime, terrorism, vandalism, and other suspicious activities. Because manual analysis of such large volumes of data is prohibitively costly, a pressing need exists for developing effective software tools that can aid in the automatic or semi-automatic interpretation and analysis of video data for surveillance, law enforcement, and traffic control and management.
Video-based anomaly detection refers to the problem of identifying patterns in data that do not conform to expected behavior and which may warrant special attention or action. The detection of anomalies in a transportation domain can include, for example, traffic violations, unsafe driver/pedestrian behavior, accidents, etc. FIGS. 1-2 illustrate pictorial views of exemplary transportation related anomalies captured from, for example, video monitoring cameras. In the scenario depicted in FIG. 1, unattended baggage 100 is shown and identified by a circle. In the scenario shown in FIG. 2, a vehicle is shown approaching a pedestrian 130. Both the vehicle and pedestrian 130 are shown marked by a circle.
Many common anomalies can arise from a single object. On the other hand, joint anomalies can also occur involving two or more objects. For example, in the area of transportation, accidents at traffic intersections are indeed based on joint and not just individual object behavior. Also, it is possible that the individual object behaviors are not anomalous when studied in isolation, but in combination produce an anomalous event. For example, a vehicle that comes to a stop at a pedestrian crossing before proceeding could be a result of the vehicle coming in very close proximity with a crossing pedestrian or another vehicle.
Several approaches have been proposed to detect traffic-related anomalies based on an object tracking technique. In one prior art approach, nominal vehicle paths can be derived and deviations thereof can be searched in a live traffic video data. The vehicle is tracked and its path is compared against nominal classes during a test or evaluation phase. A statistically significant deviation from all classes indicates an anomalous path. A problem associated with such an approach is that it is difficult to detect an abnormal pattern in realistic scenarios involving multiple object trajectories in the presence of occlusions, clutter, and other background noise.
Another prior art approach involves the use of a sparse reconstruction model for anomaly detection. For example, normal or usual events in a video footage can be extracted and categorized into a set of nominal event classes in a training step to form a training dictionary. The categorization is based on a set of n-dimensional feature vectors extracted from the video data and can be performed manually or automatically. Parametric representations of vehicle trajectories can be chosen as the feature vectors. The hypothesis underlying sparse reconstruction is that any test video sample representing a nominal event can be well explained by a sparse linear combination of samples within one of the nominal classes in the training dictionary. On the other hand, an anomalous event cannot be adequately reconstructed using a sparse linear combination of training dictionary samples. Thus, anomaly detection is accomplished by evaluating a sparsity measure, or equivalently, an outlier rejection measure of the reconstruction.
Specifically, let us take the case of single-object anomaly detection. The training samples from the i-th class can be arranged as columns of a matrix Aiεn×T wherein T is the number of training samples in a given class. A dictionary Aεn×KT with respect to the training samples from all K classes can then be formed as follows: A=[A1, A2, . . . , AK]. A test image yεn from the m-th class is conjectured to approximately lie in the linear span of those training samples belonging to the m-th trajectory class and may hence be represented by a sparse linear combination of the set of all training trajectory samples in that class, as shown below in equation (1):
                    y        =                              A            ⁢                                                  ⁢            α                    =                                    [                                                A                  1                                ,                                  A                  2                                ,                …                ⁢                                                                  ,                                  A                  K                                            ]                        ⁡                          [                                                                                          α                      1                                                                                                                                  α                      2                                                                                                            ⋮                                                                                                              α                      K                                                                                  ]                                                          (        1        )            wherein each α1εT. Typically for a given trajectory y, only one of the αi's is active (corresponding to the event class from which y is generated), thus the coefficient vector αεKT is modeled as being sparse and is recovered by solving the following optimization problem:
                              α          ^                =                                                                                                  arg                    ⁢                                                                                  ⁢                    min                                                                                                α                                                      ⁢                                                          α                                            1                        ⁢                                                  ⁢            subject            ⁢                                                  ⁢            to            ⁢                                                  ⁢                                                                            y                  -                                      A                    ⁢                                                                                  ⁢                    α                                                                              2                                <          ɛ                                    (        2        )            wherein the objective is to minimize the number of non-zero elements in α. It is well known from the compressed sensing literature that utilizing the l0 norm leads to an NP-hard (non-deterministic polynomial-time hard) problem. Thus, the l1 norm can be employed as an effective approximation. A residual error between the test trajectory and each class behavior pattern can be computed as shown in equation (3) to determine a class to which the test trajectory belongs:ri(y)=∥y−Ai{circumflex over (α)}i∥2 i=1,2, . . . ,K  (3)
The test trajectory is assigned to that class with minimum residual error. If anomalies have been predefined into their own class, then the classification task also accomplishes anomaly detection. Alternatively, if all training classes correspond to nominal events, then anomalies can be identified via outlier detection. To this end, an outlier rejection measure can be defined and utilized to measure the sparsity of the reconstructed α:
                              SCI          ⁡                      (            α            )                          =                                            K              ·                                                max                  i                                ⁢                                  ||                                                            δ                      i                                        ⁡                                          (                      α                      )                                                        ⁢                                      ||                    1                                    ⁢                                      /                                    ||                  α                  ⁢                                      ||                    1                                    ⁢                                      -                    1                                                                                      K              -              1                                ∈                      [                          0              ,              1                        ]                                              (        4        )            wherein δi(α): T→T is the characteristic function that selects the coefficients αi with respect to the i-th class. The nominal samples are likely to exhibit a high measure, and conversely, anomalous samples will likely produce a low measure. A threshold on SCI(α) determines whether or not the sample is anomalous. Such a sparsity based framework for classification and anomaly detection is robust against various distortions, notably occlusion and is robust with respect to the particular features chosen, provided the sparse representation is computed correctly.
One notable shortcoming of the aforementioned formulation is that it may not detect joint anomalies involving multiple objects since it does not capture the interactions required to detect these types of multi-object anomalies.
Based on the foregoing, it is believed that a need exists for an improved system and method for automatically detecting multi-object anomalies at a traffic intersection, as will be described in greater detailed herein.