Computerized object detection involves detecting or identifying objects within digital images. An image is typically represented as a number of pixels, usually organized as a grid, where each pixel has a different value. In the case of a black-and-white image, each pixel of the image may have a single value, substantially corresponding to the intensity or brightness of the pixel within the image. In the case of a color image, each pixel of the image may have values for different color components, such as red, green, and blue color components, hue, saturation, and value color components, or other types of color components. The values for the different color components of a given pixel together make up the color of that pixel.
Computerized object detection has proven to be a difficult technical problem. Unlike the human eye and the human brain, computers have difficulty in quickly identifying what objects are present within an image. For instance, objects may be vehicles, people, and faces. Objection detection is thus the process of determining whether a given object is present within a given image, and is further the process of determining where within an image an object is located.
One process for detecting objects within an image is a two-stage approach. In a first stage, an image is analyzed to determine potential candidates, or potential areas within the image, in which a given object may be located. In the second stage, each of these potential candidates or areas is then analyzed in more detail, to determine if any of the potential candidates or areas of the image in actuality contains the object. The first stage is therefore desirably performed relatively quickly (for each location), since the entire image has to be analyzed. The second stage can then be preformed more slowly, since the potential candidates or areas of the image that may contain the object has been significantly reduced in number in the first stage.
A conventional approach to analyzing an image to determine potential candidates or areas in which a given object may be located is to employ linear filter banks. The linear filter banks may be matched filters, or templates, representing the objects themselves, or more basic shapes that correspond to the objects. Examples of the latter filter banks include Gabor filter banks, for instance. Linear filter banks are complex data structures, however, such that analyzing an image to determine potential candidates or areas in which a given object may be located using such filter banks can be a time-consuming process.
Therefore, to determine whether an image contains the object represented by one or more linear filter banks, a simple correlation function is typically used as primarily the only way to employ linear filter banks in a time-efficient manner. Particularly, Fast Fourier Transform (FFT) technology is usually employed to compute the correlation function. However, correlation is inflexible, and can provide unreliable evaluation results that are sensitive to misalignment, noise, and outliers. As a result, the first stage of the object detection process may include an inordinate number of potential candidates or areas in which an object may be located within an image, or may miss the actual area in which the object is located within an image.
For these and other reasons, therefore, there is a need for the present invention.