1. Field of Invention
The present patent document is directed towards systems and methods for object detection. More particularly, the present patent document is directed towards systems and methods that provide adaptive threshold values for different input images.
2. Description of the Related Art
Object detection from images can be important to many applications, such as manufacturing, surveillance, robotics, and security. Predominant approaches for object detection usually scan the images with sliding windows at various scales to identify the locations and scales of the contained object. To determine whether a local window includes the object of interest, discriminative approaches extract image features in a local window and construct classifiers for detection. The local features are then used to identify the object in the classification process by algorithms such as AdaBoost or support vector machine (SVM). In typical object detection systems, a detector outputs a continuous value, and a threshold value is used to compare with the detector output to make a final classification decision. That is, whether the object of interest is deemed to be in a local window of the image is based upon whether or not the response value is greater than a threshold value. Although different values may be evaluated to find an acceptable threshold value, a single value is used globally for all input images.
Consider the receiver operating characteristic (ROC) curve depicted in FIG. 1. FIG. 1 shows an ROC curve 105 that balances performance metrics of recall (on the x-axis) and precision (on the y-axis). Recall represents a measure of the ability of a detector to detect all of the objects of interest in an image or images. Precision represents a measure of the ability of a detector to correctly detect only the objects of interest in an image or images. Thus, a detector may have high recall by selecting a large number of image patches, but a significant number of those patches may be false positives. A detector may have high precision, meaning that the detected images patches contain few, if any, false positives, but such a detector may improperly exclude image patches that should have been included (i.e., the detector results have a high false negative rate). Thus, a balance is typically struck that allows for a compromise of precision and recall. For example, the recall rate 110 is selected and where that rate 110 intersects the ROC curve 105 at point 115 is the threshold value. Note that the threshold value must be lie upon the ROC curve. As noted previously, this single threshold value is used for any arbitrary image.
However, using a single threshold value regardless of the input image can be problematic. Detection is among the most challenging vision tasks due in part to the great variety of appearances and shapes of objects, variability of environments, and variability in image quality. Having a single threshold value regardless of the input image can produce less than optimal detection results. Accordingly, systems and methods are needed that can provide flexibility when trying to detect an object or item in an image.