(1) Field of Invention
The present invention relates to a system for object detection and, more particularly, to a system for object detection via multi-scale attentional mechanisms.
(2) Description of Related Art
Traditional computer vision approaches to high level functions, such as recognition on static images, generally involve exhaustive computations. Exhaustive scan methods use a trained classifier to search for instances of the target object class. These methods are able to detection objects in static images, but have two major shortcomings. Exhaustive search and application of a classifier makes these methods too slow to run in real- or near-real time. A training stage requires significant additional offline processing time, which must be performed for new target classes. They also require large amounts of human annotated training data which otherwise may not be required by the rest of the system. Results are heavily dependent on the training data, making these methods blind to novel object classes. In other words, the methods suffer from being unable to detect objects that are dissimilar to training examples.
Another class of methods used for object detection is based on motion detection. These methods require video sequence input in order to estimate frame-to-frame differences, which is not always available. Motion estimation can suffer when frame rate is low for methods such as optical flow, which can also take longer to process than feature based methods. Feature based motion estimation typically uses planar homography to estimate global motion, but requires the often incorrect assumption that features are located on a flat surface. This is an effective way to detect moving objects, but is in incapable of detection of stationary objects and cannot be applied to static images. Saliency algorithms are biologically inspired methods used to generate output similar in spirit to the output maps of motion detection and exhaustive classification, but are much faster.
Thus, a continuing need exists for a system that uses color and size cues to detect objects from static images and merge proto-objects, allowing the system to detect generic objects without training examples.