A process for finding and localizing instances of objects of particular classes (e.g. car, pedestrian, cyclist) from an image is generally referred to as object detection. Object detection is a challenging task with significant applications such as autonomous driving. A majority of existing state-of-the-art detectors are learning based and formulate the detection of objects as a classification problem.
A common approach is to apply a sliding window over an image, construct a feature vector from the sliding window, and then classify the feature as either an object or background. Typically, the sliding window has a predefined fixed size. A detection model, such as Boosting, can be trained based on feature representations of the fixed-size windows. To detect objects of different scales, the image is re-scaled to many levels to form an image pyramid or scale space. Applying the trained detection model for canonical scale on a resized image is equivalent to detection on a different scale.
During detection, the trained detector searches a very large number of possible locations and scales at which the objects might occur. Exploring such a big search space is a daunting task, especially for resource-limited embedded systems or mobile devices.
It would be desirable to implement an efficient two-stage object detection scheme for an embedded device.