Visual object detection and tracking for fixed surveillance cameras is a fundamental function of video analytics and plays a critical role in many intelligent video applications including visual event/behavior detection, video content extraction, video content guided video compression, video content based forensic search, etc. As cameras become less expensive and are installed more widely, this function becomes more important than ever and is expected to offer higher performance.
A challenge for object detection is to accurately detect objects under various scenarios and conditions, such as normal lighting, low lighting, day-time, night-time, in the presence of reflection and/or shadows, etc. Typically, manual manipulation and delicate tuning of parameters, including detection sensitivity, are used in order to fit the scene environment and lighting conditions. If conditions change, preset parameters may become invalid and poor performance could be produced. For example, parameters set for normal lighting conditions may not apply to low lighting cases and thus objects may not be detected.
Another challenge in object detection and tracking is over-segmentation of an object, i.e., a single physical subject is split into multiple visual parts. As a result, multiple tracks (trajectories) are produced for the single physical subject, and the tracks appear fragile and/or may fluctuate over time, thus providing erroneous information when these tracks are used in raising alarms or for forensic search. For instance, when a person walks in a scene, the person's body parts (e.g., head, torso, hands, and legs) should be detected as a single image blob and then tracked as a whole entity over time. Body parts, however, are sometimes segmented separately and each segment may be tracked some of the time, merged and split at other times, an/or appear and disappear frequently. This can be confusing and annoying when the tracks are visualized on a display and further processing on them may lead to incorrect outcomes (e.g., wrong object type classification, event/behavior detection, etc.).