(1) Field of Invention
The present invention relates to a system for multi-object detection and recognition in cluttered scenes and, more particularly, to a system for multi-object detection and recognition in cluttered scenes using exclusive non-maxima suppression (eNMS).
(2) Description of Related Art
Object cluttering is a notorious problem in object detection and recognition. Conventional classifiers usually are able to identify if there is an object within an image patch, but cannot tell how many objects are there. Exhaustive sliding window search across the whole image might find multiple objects in cluttered scenarios, but it is usually computationally intensive and slow. Attention-based approaches often cannot separate multiple object instances in cluttered scenarios since they usually find non-tight bounding boxes. Other popular methods also use object tracking to separate and distinguish objects that are cluttered. However, correctly initializing multiple object tracking in cluttered scenarios is a practical challenge, and most tracking approaches cannot deal with static objects.
Non-maxima suppression (NMS) is widely used in object detection; however, it is typically used across the entire image to suppress detections generated by exhaustive sliding window search that have too much overlap. Such exhaustive searching across the entire image is usually computationally intensive and very time consuming.
Breitenstein et al. use a detection-by-tracking methodology which requires explicit object tracking, as described in “Robust tracking-by-detection using a detector confidence particle filter”, in Proc. of ICCV, 2009, pp. 1515-1522, which is hereby incorporated by reference as though fully set forth herein. There is also research using the silhouette of cluttered people to separate individual persons, as described by Haritaoglu et al. in “Hydra: multiple people detection and tracking using silhouettes”, in Proc. of ICIAP, 1999, pp. 280-285, which is hereby incorporated by reference as though fully set forth herein. Another recent work uses a sliding window search approach and additional thermal imagery for detecting vehicles or people from UAV imagery, which is described by Gaszczak et al. in “Real-time people and vehicle detection from UAV imagery”, in Proceedings of the SPIE, Volume 7878, article i.d. 78780B, 2011, which is hereby incorporated by reference as though fully set forth herein.
In general, the aforementioned object detection approaches are often based on traditional classification with global sliding windows or detection-by-tracking methodology. Although multiple object instances may be implicitly detected by searching a large amount of sliding windows across the whole image, such approaches are too time consuming. On the other hand, the tracking based system often suffers from the difficulty of robust tracking.
Thus, a continuing need exists for a method that is computationally efficient and allows for detection and recognition of multiple objects of the same class within selected image portions.