1. Technical Field
The present invention relates to video processing and more particularly to a system and method for detecting events using composite alarm definitions.
2. Description of the Related Art
Tracking of objects is an important and challenging problem which finds wide-ranging application areas such as video surveillance, gathering statistics from videos, and traffic flow monitoring. Although many methods have been proposed in literature to detect and track moving objects, tracking seems insufficient for most applications. Detection and tracking results should be analyzed to detect events of interest and generate alerts. For example, in the surveillance scenario, the main interest is detecting instances of events such as people entering a prohibited region, people entering through an exit only door, people parking a car but not entering the building, etc.
Several approaches have been taken to tackle the event detection problem with single camera systems. However, a single camera does not have a large enough field of view, i.e., it does not have large enough coverage. Thus, many applications need multiple cameras. In addition, event definitions should not be pre-defined and hard-coded into a system, and the event definitions should not be limited in number. Instead, there should be a generic event definition scheme which can express and cover different, high-complexity and semantically higher level events by using a custom language. This scheme should not require the technical knowledge of the language from the end-user.
Thus, event definition itself introduces several challenges. Depending on the application and the actual user, there will be a need to define events with higher complexities, especially in multi-camera systems. In addition, defined events should be saved and entered to the system in an efficient way so that the existing tracking engines and the overall system runs in the same manner without being affected by different event definitions.
Even in a basic scenario, where, for example, one needs to detect the entrance of multiple objects in a defined region in one camera view, there are multiple parameters that can be customized by a user. Useful parameters can be the number of objects, the location and the shape of the region, and the minimum and maximum sizes of the detected objects etc. The size can be important, e.g., a user might not be interested in detecting a cat in the specified region as opposed to a car or person. There should be a user-friendly interface that can retrieve this information from the user, and then can save it into a well-structured format for the detection of the actual event in a video sequence.
However, the problem is usually more complicated than detecting motion on a single camera view. In real world scenarios, the events of interest are higher in complexity and semantic level. As an example, three event definitions which have increasing level of complexity include: 1) detecting motion in a selected region in a single camera view; 2) detecting motion in the first selected region and then detecting motion in another region in at least T seconds on the same camera view; 3) detecting motion in a selected region in the first camera view and then detecting motion in a selected region in the other camera view.