Multi-object detection is one of the most important components in various computer vision applications, such as surveillance, sports video analysis. Thanks to impressive progress in object detection (better feature extraction method such as Histogram of Oriented Gradient and fast cascade classifiers), tracking-by-detection system has been attractive in recent years. However, keeping the accuracy, if the video resolution becomes higher, both the detection and tracking speeds are becoming slower. And most existing systems cannot run at full frame-rate, especially in high definition video or high frequency video.
A paper published in “IEEE TRANSACTION ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE” (Michael D. Breitenstein, Fabian Reichlin, et al. Online Multi-Person Tracking-By-Detection From A Single, Uncalibrated Camera. Submitted January 2010, revised October 2010), describes a tracking-by-detection method, which is mainly composed of a detector and a data association unit, processing every frame in the image sequence of video, i.e. the method of frame-by-frame detection.
In this method of the paper, as to the frame timestamp t, the human detector (sliding-window based or feature-based, etc.) detects over the whole image to give out detection results, and then the data association unit decides which detection result should guide which trajectory of last tracking results on frame (t−1). Approaches for data association problem could be complex to pursue high accuracy. For example, this reference paper focuses on greedy algorithm and scoring function, considering detector confidence, human position, motion and appearance.
But there is a main problem of the method of the reference paper. That is, frame-by-frame detection on the whole image frame can heavily slow the processing speed, no matter sliding-window style or feature-based style.
For one frame, the bigger the search region for detection, the slower the detection speed. Besides, there are motion coherence between neighbour frames, thus frame-by-frame detection is a huge abuse. These two things both heavily slow the tracking speed, making it far from real-time processing.
Based on the above, there is a need in the art for a system of fast multi-object tracking, which can help to achieve high tracking speed with multi-objects in videos and not lose accuracy as well.