Detecting objects by locating the position and size of objects in digital images is an important technology used in numerous applications. In digital cameras, the ability to detect faces can offer improvement in automatic camera control functions such as exposure, focus, color balance, and flash control. Video cameras can also utilize object detection to control various recording modes and qualities. Object detection also serves as a prerequisite function needed to enable more advanced features, such as smile-triggered shutter control, eye-blink avoidance, and object recognition.
One conventional object detection method is implemented as a binary pattern-classification task. In a binary pattern-classification task, the content of a given part of an image is transformed into features. Afterwards, a classifier trained on example objects determines whether a particular region of the image is an object or a non-object. Objects can be faces or other features. Non-objects can be background patterns. A window-sliding technique is often employed. In a window-sliding technique, a classifier is used to classify portions of an image. The portions of the image are classified at all locations and scales as either objects or non-objects. The portions of the image classified are usually square or rectangular.
A commonly used conventional approach for object detection is based on the Viola-Jones method. The general structure of the Viola-Jones method is composed of a hierarchy of layers. At the lowest hierarchy layer, a window associated with an image area for examination is used to determine whether the image area contains an object or not an object. The resolution of the window is coarse enough to ignore detail that does not relevantly contribute to the decision, yet fine enough to resolve differences in broad object features. Broad object features may include eyes, nose or a mouth for face detection methods in digital images. Over this window, features or measures are computed, and classification is performed using these features as inputs. The classifier output is a binary decision value declaring “Object” or “Not-Object”.
The window is scanned spatially to cover all locations of the input image for performing object detection. To be able to perform object detection over a range of possible object sizes, the scanning process is repeated over a range of scale sizes. The scaling may be accomplished using two methods. For the first method, the window size and associated feature computations are scaled through the range of desired scales, using the same intact input image. For the second method, the window size is kept fixed and the original input image is scaled down, forming a series of downsampled images to cover the scale range. The choice to either scale the scanning window or to scale the input images is an implementation choice.
The classifier in the Viola-Jones method is composed of layers of sub-classifiers arranged in a hierarchical tree-like structure. At the lowest level, a weak classifier is formed using one or more Haar features computed in the operating window. Haar features are sums of pixels over rectangular regions.
A particular realization of the Viola-Jones classifier can be described by parameters that define structure and decision processing. The overall-defining parameter set is derived through training for the particular detection task at hand, and serves as the blueprint for executing the detection task. The parameters may include the number of rectangles that make up the Haar feature for each weak classifier, rectangle weights in the summation of the Haar feature, rectangle coordinates in the operating window, or weak classifier decision thresholds and decision output weights.
Classifiers can be described by a set of programmable instructions which are loaded from external memory. Loading classifier instructions from external memory allows classifiers to be tuned, changed, and upgraded.
The bandwidth involved in fetching classifier parameters from an external memory can limit the ability for object detection tasks to perform in a fast and efficient way necessary for certain applications, such as real-time video applications.
It would be desirable to implement a processor used for object detection that operates efficiently for use in real-time video applications.