When attempting to detect an object in an image, it is important to consider factors that aid in improving the ability to detect the object. Both the scale of an object of interest and its contextual information are significant factors to consider in object detection. Traditionally, like objects were treated or modeled in a like manner. While such approaches appear logical, they fail to consider that objects at different sizes can have qualitatively very different appearances. Thus, systems and methods of object detection can have better detection rates by modeling or representing like objects of different size or resolution differently. Furthermore, context information, such as pixel location, object height, width, etc., also provides important features that can aid object detection.
As previously noted, prior methods for detecting objects within an image using classifiers were limited because these methods did not adequately consider contextual information and to account for various scales of objects. The present patent document provides embodiments of multi-resolution feature descriptors that may also be combined with context feature descriptors to improve object detection, such (by way of example and not limitation) human detection. In embodiments, multi-resolution feature descriptors, context feature descriptors, and both may be used in training detection systems and in detection systems.
Multi-Scale Feature(s)
FIG. 1 depicts a multi-scale methodology according to embodiments of the present invention. As illustrated in FIG. 1, in embodiments, an image patch along with context region is obtained (110) from an input image. In the current patent document, systems and methods shall be described in terms of an image patch from an input image, but one skilled in the art shall recognize that the systems and methods may be applied to a plurality of image patches from an image or images. In embodiments, the image patch may be obtained from a scan window image patch. The scan window image patch may be obtained from applying an existing detector, such as AdaBoost or SVM, to an input image or images to obtain one or more scan window image patches. In embodiments, the initial scan window image patch is expanded to double by including a context region. For example, the length (l) and width (w) of the initial scan window may each be doubled so that the context region, with the initial image patch centered within the context region, has dimensions of 2l×2w. In embodiments, if the initial image scan is at or near an edge of the image so that a full context region cannot be formed from original image pixels, pixels may be added to the missing region or regions. In embodiments, the missing pixels may be black, white, have the same average brightness as a local region, may be an extension of a line or portion of pixel values at an edge of the missing region, may be filled with some portion of the image, one or more values computed from the image or a portion thereof, some combination thereof, or may have some other configuration. It shall be noted that other context region sizes, shapes, and configurations may be used, including having no additional context region in one or more context region portions or directions.
Returning to FIG. 1, a high-resolution feature is computed (110) using the image patch sized to a high-resolution size. FIG. 2 presents a method for forming a high-resolution feature according to embodiments of the present invention. A mid-resolution feature is also computed (115) using the image patch and the context region sized to a mid-resolution size. FIG. 3 presents a method for forming a mid-resolution feature according to embodiments of the present invention. A low-resolution feature is computed (120) using the image patch and the context together sized to a low-resolution size. FIG. 4 presents a method for forming a low-resolution feature according to embodiments of the present invention. It shall be noted that no particular order is necessary for forming the high-, mid-, and low-resolution features. Rather, these features may be formed in different orders or even formed concurrently. Finally, a multi-scale feature for the image patch may be generated (125) for the image patch using its high-, mid-, and low-resolution features. Also as explained in more detail below, embodiments of the multi-scale feature may also include a perspective context feature or features.
High-Resolution Feature
FIG. 2 presents a method 200 for forming a high-resolution feature according to embodiments of the present invention. As illustrated in FIG. 2, in embodiments, an image patch (A) 205 is obtained from an input image. The image patch may be obtained in like manner as described with reference to FIG. 1. In embodiments, the image patch 205 is resizes to a normalized high-resolution size (e.g., 64×128 pixels, although other sizes may be used). The normalized image patch 205-HR is used to compute (225) a high-resolution feature 210. In embodiments, the feature may be a histogram of gradient (HoG) feature descriptors, which is well known to those of skill in the art. It shall be noted that other methods may be used to generate a feature from the normalized image patch. For example, possible alternatives include, but are not limited to, LBP (Local Binary Pattern), Shapelet, Harr, Channel Features, and etc. It shall also be noted that in alternative embodiments, one or more features may be extracted from the context region and combined with the feature from the image patch in forming a high-resolution feature. Examples of ways to combine the features include, but are not limited to, adding, concatenating, subtracting, averaging, and the like.
Mid-Resolution Feature(s)
FIG. 3 presents a method 300 for forming a mid-resolution feature or features according to embodiments of the present invention. As illustrated in FIG. 3, in embodiments, an image patch (A) 205 and context region 305 are obtained from an input image. The image patch and context region may be obtained (325) as described previously. It shall be noted that, in embodiments, the step of obtaining the image patch and context region may be done once in preparation for generating the multi-scale feature or features. In embodiments, the image patch 205 is resized (330) to a normalized mid-resolution size (e.g., 32×64 pixels, although other sizes may be used).
In embodiments, the context region 305 is divided to form three portions. As depicted in the embodiment in FIG. 3, the context region is divided into regions 1, 2, and 3, and the like-numbered regions are combined into three patches 310-1, 310-2, and 310-3. It shall be noted that the context region may be divided into different portions or regions, and may be combined in different ways. In embodiments, each portion 310-x has or is normalized to the mid-resolution size. One skilled in the art shall recognize that the resizing, when needed, may be done in different order or different configurations. For example, in embodiments, the context region with the image patch may together be resized to twice the mid-resolution size (e.g., 64×128 pixels). Thus, each group of regions (310-1, 310-2, and 310-3) forms a patch of the same size as the center scan window image patch 205-MR (32×64 pixels, in this example).
A mid-resolution feature 315-x is computed (335) for each of the four normalized regions (205-MR, 310-1, 310-2, and 310-3). In embodiments, the feature may be a histogram of gradient (HoG) feature descriptors, although other methods may be used to generate a feature from the normalized regions.
In embodiments, the mid-resolution features 315-x may be combined (340) into one or more mid-resolution feature descriptors. In embodiments, the mid-resolution feature of the image patch 315-A may be one mid-resolution feature descriptor and the mid-resolution feature descriptors 315-1, 315-2, and 315-3 of the context portions may be combined (340) into another mid-resolution feature. In alternative embodiments, these two features may be combined into a single feature. One skilled in the art shall recognize that features may be combined in a number of ways, such as addition, concatenation, subtraction, averaging, etc. For example, in embodiments, the features may be combined by stacking the normalized histograms into a single HoG vector.
Low-Resolution Feature
FIG. 4 presents a method 400 for forming a low-resolution feature according to embodiments of the present invention. As illustrated in FIG. 4, in embodiments, an image patch (A) 205 and context region 305 are obtained from an input image. The image patch and context region may be obtained (420) as previously described. Also as previously noted, the step of obtaining the image patch and context region may be done once in preparation for generating the multi-scale feature or features. In embodiments, the context region together with the image patch 405 are resized (425) to a low-resolution size (e.g., 32×64 pixels, although other sizes may be used). A low-resolution feature 415 is computed (430) from the normalized low-resolution context region and image patch 410-LR. In embodiments, the feature may be a histogram of gradient (HoG) feature descriptors, although other methods may be used to generate a feature from the normalized regions.
Perspective Context Features
It is not uncommon for objects of interest in images to appear at certain regions and with certain sizes. Information such as location and size of the object in an image can provide important perspective context of the camera. Embodiment of the present invention may generate and use this contextual information to aid in object recognition.
FIG. 5 depicts a method for generating a perspective context feature according to embodiments of the present invention. As depicted in FIG. 5, given an image patch from an image, a perspective context feature of the image feature is generated. In embodiments, the perspective context feature comprises the location (x,y) of the image patch within the image and the size (w,h) of the image patch. The location (x,y) may represent the position of a set reference point of an image patch, such as its center pixel location, a corner pixel location, or the like. The size (w,h) of the image patch may represent the width and height of the image patch. In embodiments, the perspective context descriptor may in the form of (x,y,w,h). One skilled in the art shall recognize the perspective context descriptor may be in different forms and configurations.
Returning to FIG. 5, the perspective context descriptor of an image patch may be combined (510) with one or more of the high-, mid-, and low-resolution features. In embodiments, the perspective context descriptor is combined with the high-, mid-, and low-resolution features to form a single feature, a multi-scale and perspective feature, for an image patch. In embodiments, the multi-scale and perspective feature may be formed by stacking all of the features into a single vector.
Multi-Scale and Perspective Feature Combinations
It shall be noted that any of the high-resolution, mid-resolution, low-resolution, and perspective features, or any constitute parts thereof, may be combined into features. For example, in embodiments, a high-resolution feature and a low-resolution feature may be combined to form a feature. Or, by way of another example, the high-resolution feature may be combined with the perspective feature. In yet another example, the mid-resolution feature of the just the image patch may be combined with the low-resolution and perspective features to form a feature. One skilled in the art shall recognize that any number of permutations and combinations of the features may be used to form composite features.
Training and Detecting using Multi-Scale and Perspective Features
Detection algorithms usually comprise two parts: (1) training, in which classifiers are trained, and (2) classification, which uses the trained classifiers for detection. In embodiments, the multi-scale and perspective features may be used in both training a detection model and in detecting the objects of interest in images.
Training using Multi-Scale and Perspective Context Features
It shall be noted that the high-resolution, mid-resolution, low-resolution, and perspective features may be combined and used in different combinations, which affects the training process. By way of example and not limitation, two possible combinations are presented below with respect to FIGS. 6 and 7. FIG. 6 depicts classifier training in which the high-resolution, mid-resolution, low-resolution, and perspective features are all combined into a single feature and a classifier is trained for that feature. FIG. 7 depicts classifier training in which a classifier is trained for each of the high-resolution, mid-resolution, low-resolution, and perspective features.
FIG. 6 depicts a process flow for training a detection system using multi-scale and perspective features according to embodiments of the present invention. In embodiments, the process commences by obtaining (605) a set of training image patches that include labels indicating whether an image patch does or does not include the object of interest and perspective context information. In embodiments, the set of labeled training image patches may be obtained from input images using an existing detector or detectors, such as AdaBoost or SVM, and the ground truth information. Alternatively, the training image patches may be obtained from third-party training image sets.
In embodiments, for each image patch, a multi-scale with perspective context feature is generated (610). The multi-scale with perspective context feature may be obtained as explained previously with reference to FIGS. 1-5. In embodiments, the multi-scale with perspective context feature for an image patch forms a vector. In embodiments, a classifier training phase comprises using (615) the image patches and multi-scale with perspective context features to train a classifier, as is well known to those of skill in the art. The trained classifier may be used to detect objects of interest in subsequent input images.
FIG. 7 depicts a process flow for training a detection system using multi-scale and perspective features according to embodiments of the present invention. In embodiments, the process commences by obtaining (705) a set of training image patches, which include labels indicating whether an image patch does or does not include the object of interest and includes perspective context information. In embodiments, the set of labeled training image patches may be obtained as discussed above with respect to FIG. 6.
In embodiments, for each image patch, high-resolution, mid-resolution, low-resolution, and perspective context features are generated (710). The high-resolution, mid-resolution, low-resolution, and perspective context feature may be obtained as explained previously with reference to FIGS. 2-5. In embodiments, these features and/or any combination of two or more of these features may form features used to train corresponding classifiers for use in object detection. In embodiments, each feature used for training a classifier and for detection forms a vector. In embodiments, a classifier training phase comprises using (715) the image patches and the features to train a classifier for each feature. Training of classifiers using features is well known to those of skill in the art. The trained classifiers may then be used to detect objects of interest in other input images.
Detecting Using Trained Classifier(s)
Given trained classifier or classifiers, this classifier or classifiers may be used as part of a detection system to detect objects of interest in images. FIG. 8 depicts a process flow for using a detection system that includes a classifier trained using multi-scale with perspective context features according to embodiments of the present invention.
In embodiments, the process commences by obtaining (805) one or more image patches to form a set of image patches. In embodiments, the set of image patches may be obtained from one or more input images using an existing detector or detectors, such as AdaBoost or SVM. In embodiments, the confidence value(s) of the initial detector may be set so that the detected images are over-inclusive, thereby reducing the possibility of excluding true image patches in the initial detection. As result of being over-inclusive, a large number of false positives will be included; however, the subsequent detection can assist in eliminating these false positives. In embodiments, the image patches may be all or some subset of the scan windows from an input image. It shall be noted that the image patches may be in the form of locations in an image.
In embodiments, for an image patch, a multi-scale with perspective context feature is computed (810) from the high-resolution, mid-resolution, low-resolution, and perspective context features generated from the image patch. A multi-scale with perspective context feature for an image patch may be obtained as explained above. After forming a multi-scale with perspective context feature, the object detection phase comprises using (815) the feature for the image patch and the trained classifier to determine whether the image contains the object of interest. For example, in embodiments, the classifier applied to an image patch outputs a score that may be used to determine whether the object of interest is present in the image patch, subject to the score meeting threshold value requirement. It shall be noted that meeting a threshold value requirement may mean equaling or exceeding a threshold value or may mean exceeding a threshold value. As can be seen by the provided description, embodiments of the object detection of the current application can allow for improved object detection using multi-scale and perspective information.
FIG. 9 depicts an alternative object detection method using different combinations of multi-scale and perspective feature trained classifiers according to embodiments of the present invention. In contrast to the method depicted in FIG. 8 which had a single classifier, the method of FIG. 9 has multiple trained classifiers.
In embodiments, the process commences by obtaining (905) one or more image patches, which may be obtained as previously discussed. In embodiments, for an image patch, a high-resolution, mid-resolution, low-resolution, and perspective context features are obtained (910) for the image patch—just was done for the method of FIG. 8. However, unlike the method of FIG. 8 in which all the features were combined into a single multi-scale with perspective context feature, the individual features may also be used. Alternatively or additionally, different permutations and different combinations of any of the features may be used alone or in concert for training and detection. One skilled in the art shall recognize, however, that each unique feature or feature composite should have a corresponding trained classifier.
After forming the features, the object detection phase comprises using (915) one or more of the features for the image patch and one or more of the corresponding trained classifiers to classify the image patch.
In an embodiment, a classifier may be selected based upon most closely matching the size of the input image patch. For example, if the image patch has a size equals to or close to the mid-resolution size, the trained mid-resolution classifier may be used in conjunction with the mid-resolution feature of the image patch to classify that image patch. One skilled in the art shall recognize that if such an embodiment is employed, then only the size-corresponding feature need be generated. In embodiments, the size-corresponding feature may include the perspective context feature. In yet other alternative embodiments, the size-corresponding feature classification may be compared with, or used in conjunction with, one or more other classifications, such as with a multi-scale feature classification, when classifying whether or not image patch contains the object of interest base upon classifier outputs. One skilled in the art shall recognize other combinations and configurations that may be employed.
FIG. 10 depicts some sample object detection results according to embodiments of the present invention. FIG. 10 compare detection rates (precision/recall) and the corresponding processing time with respect to different configuration or embodiments of feature resolutions performed using captured datasets. As shown, when the features are extracted at higher resolution (36×72) (i.e., the normalized image patch size of 36×72 pixels), it achieved comparatively higher precision and recall. But, when lower resolution (12×24) is used, the running time is smaller. However, in embodiments, a scale-adaptive feature extraction (i.e., the features are extracted according to the size of the original image patch and the closest resolution is chosen for extracting features) achieved even higher precision and recall while at a comparably low running time cost. One skilled in the art shall recognize that other scale-adaptive configurations may be used.
FIG. 11 also depicts some sample object detection results according to embodiments of the present invention. FIG. 11 compare the detection rates (in terms of Receiver Operating Characteristic (ROC) curves) of different embodiments of proposed context feature components. As shown, the combination of all feature components (a multi-scale feature with perspective context) 1140 achieved the highest detection rates on the sample image datasets.
System Embodiments
FIG. 12 depicts a classifier training system according to embodiments of the present invention. The system 1205 comprises an image patch generator 1215, a multi-scale and perspective context feature generator 1245, and a classifier or classifiers trainer 1250. In the depicted embodiment, the multi-scale and perspective context feature generator 1245 comprises a perspective context feature generator 1240 and a multi-scale feature generator 1220. One skilled in the art shall recognize that other system configurations may be used to perform the classifier training. It shall also be noted that a system may be configured for both training and detection and that the same or like components for both tasks may be configured to function in both capacities. In embodiments, system 1205, or parts thereof, may be implemented using a computing system, such as one described with reference to FIG. 14.
In embodiments, image patch generator 1215 receives one or more images 1210 and generates image patches from the one or more image or images 1210. In embodiments, image patch generator 1215 may use one or more existing detector or detectors, such as (by way of example and not limitation) AdaBoost or SVM, to extract image patches and perspective context information from one or more input images. One skilled in the art shall recognize that if the image patches and perspective information are supplied directly to the classifier training system, image patch generator 1215 may not be used or may be omitted from the classifier training system. In any event, image patches with labels indicating whether the image patch does or does not include the object of interest are provided to the multi-scale and perspective feature generator 1245 and to the classifier trainer 1250.
The multi-scale and perspective feature generator 1245 receives the image patches and perspective context information and generates one or more features according to the various embodiments. In embodiments, the multi-scale feature generator 1220 comprises a high-resolution feature generator 1225, a mid-resolution feature generator 1230, and a low-resolution feature generator 1235. In embodiment, the high-resolution feature generator 1225 produces high-resolution features for image patches using one or more methods previously described with reference to FIG. 2. In embodiment, the mid-resolution feature generator 1230 produces mid-resolution features for image patches using one or more methods previously described with reference to FIG. 3. In embodiment, the low-resolution feature generator 1235 produces low-resolution features for image patches using one or more methods previously described with reference to FIG. 4. The multi-scale and perspective feature generator 1245 further comprises a perspective context feature generator 1240 that generates one or more perspective context features using one or more methods previously described with reference to FIG. 5. One skilled in the art shall recognize that the image patch generator 1215 may operate as the perspective context feature generator when it identifies a location and size of an image patch. In embodiments, the multi-scale and perspective context feature generator 1245 may combine two or more of the features for an image patch into a feature. One skilled in the art shall recognize that a number of combinations may be formed by generator 1245, some of which are discussed above with respect to FIGS. 6 and 7. For example, by way of illustration and not limitation, the multi-scale and perspective context feature generator 1245 may combine the high-resolution feature, the mid-resolution features, the low-resolution feature, and the perspective context feature for an image patch into a single multi-scale with perspective context feature for the image patch.
In embodiments, the multi-scale and perspective context feature generator 1245 provides features associated with the image patches to the classifier trainer 1250. The classifier trainer 1250 uses the features and the associated labels to train one or more classifiers. The trained classifier or classifiers may then be used for object detection in images.
FIG. 13 depicts a classifier/detector system 1305 according to embodiments of the present invention. The system 1305 comprises an image patch generator 1315, a multi-scale and perspective context feature generator 1345, and a classifier/1350. In the depicted embodiment, the multi-scale and perspective context feature generator 1345 comprises a perspective context feature generator 1340 and a multi-scale feature generator 1320. In embodiments, system 1305, or parts thereof, may be implemented using a computing system, such as one described with reference to FIG. 14. One skilled in the art shall recognize that other configurations may be used to perform classification. It shall be noted that, in embodiments, training system 1205 differs from classifier system 1305 in the training module 1250 and detector 1350. Accordingly, one skilled in the art shall recognize that a combined system that can perform both training and classifying may be created by including with the common components a training module and a detector and a switching means, such as a mechanical or software implemented switch, to switch between the two modes.
In embodiments, image patch generator 1315 receives one or more images 1310 and identifies image patches in one or more image or images 1310 supplied to the classifier system 1305. In embodiments, the image patch generator 1315 also identifies perspective context information, such as image patch size and image patch position, for each image patch. In embodiments, image patch generator 1315 may use one or more existing detector or detectors, such as (by way of example and not limitation) AdaBoost or SVM, to extract image patches and perspective context information from one or more input images. The image patches with their associated perspective context information are provided to the multi-scale and perspective feature generator 1345.
The multi-scale and perspective feature generator 1345 receives the image patches and perspective context information and generates one or more features according to the various embodiments. In embodiments, the multi-scale feature generator 1320 comprises a high-resolution feature generator 1325, a mid-resolution feature generator 1330, and a low-resolution feature generator 1335. In embodiment, the high-resolution feature generator 1325 produces high-resolution features for image patches using one or more methods previously described with reference to FIG. 2. In embodiment, the mid-resolution feature generator 1330 produces mid-resolution features for image patches using one or more methods previously described with reference to FIG. 3. In embodiment, the low-resolution feature generator 1335 produces low-resolution features for image patches using one or more methods previously described with reference to FIG. 4. The multi-scale and perspective feature generator 1345 further comprises a perspective context feature generator 1340 that generates one or more perspective context features using one or more methods previously described with reference to FIG. 5. One skilled in the art shall recognize that the image patch generator 1315 may operate as the perspective context feature generator when it identifies a location and size of an image patch.
In embodiments, the multi-scale and perspective context feature generator 1345 may combine two or more of the features for an image patch into a feature. One skilled in the art shall recognize that a number of combinations may be formed by generator 1345 with use in classification, as discussed previously. For example, by way of illustration and not limitation, the multi-scale and perspective context feature generator 1345 may combine the high-resolution feature, the mid-resolution features, the low-resolution feature, and the perspective context feature for an image patch into a single multi-scale with perspective context feature for the image patch.
In embodiments, the multi-scale and perspective context feature generator 1345 provides features associated with the image patches to the classifier 1350. The detector/classifier 1350 receives the feature for an image patch, which is input to a corresponding classifier. In embodiments, responsive to the output of the classifier for that feature meeting or exceeding a threshold level, the image patch is deemed to contain the object of interest; otherwise, the image patch is deemed to not contain the object of interest.
In embodiments, a user may select the feature and/or corresponding classifier to be used in detection. For example, system 1305 may allow a user to use a single multi-scale with perspective context feature (such as the method of FIG. 8). Alternatively, a user may select different approaches, such as discussed with respect to FIG. 9. For example, in embodiments, the scale/resolution feature may be selected to make the size of the image patch. Accordingly, a corresponding scale/resolution classifier may be used for detecting. In embodiments, the system may perform multiple detection methods for the same image patch.
Cascaded Object Detection Using Features from Enclosing Regions
It shall be noted that aspects of the features presented herein may also be used in conjunction with, or separately from, the multi-scale features systems and methods. For example, embodiments of features disclosed herein may be used in cascaded object detection systems and methods.
As noted previously, existing methods typically only use features from the image patch, which results in high false detection rates. Even when cascaded systems have been employed to help reduce the high false detection rates, the cascaded system classifiers typically only use different descriptors that extracted from the same region (i.e., the image window or image patch). Such systems are only moderately successful in reducing the high false detection rates because the information extracted from the same region is generally still not discriminative enough.
In embodiments, cascaded systems and methods use feature descriptors extracted from a context region for an image patch as a second stage of the cascaded classifier to reduce false detection rates. In embodiments, these feature descriptors extracted for the context region are the same type as extracted for the image window.
FIG. 14 depicts a method for generating a cascade feature for an image patch according to embodiments of the present invention. In embodiments, a cascade feature generation process may comprise obtaining an image window (e.g., image window 205) and an enclosing region (e.g., context region 1405) with the same center location. Similar to the mid-resolution feature (see FIG. 3 and the corresponding written description), in embodiments, the context region is divided to form three groups or regions. As depicted in the embodiment in FIG. 14, the context region is divided into regions 1, 2, and 3, and the like-numbered regions are combined into three patches or regions (e.g., 1410-1, 1410-2, and 1410-3). It shall be noted that the context region may be divided into different portions or regions, and may be combined in different ways. In embodiments, if the image window has dimensions l×w and the context region has dimensions double that (2l×2w), each group of regions (e.g., 1410-1, 1410-2, and 1410-3) forms a patch of the same size as the center scan window image patch (e.g., 1410-A/205). Additionally, or in alternative embodiments, one or more of the image portions (e.g., 1410-x) may be normalized to a set size, similar to the mid-resolution features of FIG. 3.
One or more features may then be computed for each of the four regions. In embodiments, a histogram of gradient (HoG) feature descriptors may be computed for each of the regions. One skilled in the art shall recognize that other methods or features may be used. In embodiments, the HoG features for the three context region portions (e.g., 1410-1, 1410-2, and 1410-3) are averaged and subtracted by the HoG of the image patch (e.g., 1410-A/205) to form a cascade feature 1420, which may be used for a second stage of cascaded classifiers. One skilled in the art shall recognize that other feature combination may be used for the cascade feature.
FIG. 15 depicts system and method embodiments of a cascaded classifying according to embodiments of the present invention. In embodiments, the cascaded classifier system 1505 comprises an initial, or first stage, detector 1510 and a cascade-feature-based rejecter 1515 as a second stage classifier. In embodiments, the first stage comprises a HoG Adaboost detector using features extracted from the image patch (e.g., image patch 1410-A/205). This stage yields a set of image patches 1525 that are labeled as containing the object of interest. However, there are likely to be a number of false positives (e.g., 1525-FP). Accordingly, a second stage classification may be performed on the set of image patches 1525 to filter out some or all of the false positives. In embodiments, the second stage, or cascaded, classifier 1515 is a difference-HoG-Feature classifier or rejecter that uses the cascade feature 1420, or some version thereof, to improve the precision by rejecting some of the false detections.
One skilled in the art shall recognize that cascade feature, the cascade systems, the cascade methodologies, may be combined with other features, systems, and methods previously discussed or referenced herein.
Having described the details of embodiments the various inventions, an exemplary system 1600, which may be used to implement one or more aspects of the present inventions, will now be described with reference to FIG. 16. As illustrated in FIG. 16, the system includes a central processing unit (CPU) 1601 that provides computing resources and controls the computer. The CPU 1601 may be implemented with a microprocessor or the like, and may also include a graphics processor and/or a floating point coprocessor for mathematical computations. The system 1600 may also include system memory 1602, which may be in the form of random-access memory (RAM) and read-only memory (ROM).
A number of controllers and peripheral devices may also be provided, as shown in FIG. 16. An input controller 1603 represents an interface to various input device(s) 1604, such as a keyboard, mouse, or stylus. There may also be a scanner controller 1605, which communicates with a scanner 1606. The system 1600 may also include a storage controller 1607 for interfacing with one or more storage devices 1608 each of which includes a storage medium such as magnetic tape or disk, or an optical medium that might be used to record programs of instructions for operating systems, utilities and applications which may include embodiments of programs that implement various aspects of the present invention. Storage device(s) 1608 may also be used to store processed data or data to be processed in accordance with the invention. The system 1600 may also include a display controller 1609 for providing an interface to a display device 1611, which may be a cathode ray tube (CRT), a thin film transistor (TFT) display, or other type of display. The system 1600 may also include a printer controller 1612 for communicating with a printer 1613. A communications controller 1614 may interface with one or more communication devices 1615, which enables the system 1600 to connect to remote devices through any of a variety of networks including the Internet, a local area network (LAN), a wide area network (WAN), or through any suitable electromagnetic carrier signals including infrared signals.
In the illustrated system, all major system components may connect to a bus 1616, which may represent more than one physical bus. However, various system components may or may not be in physical proximity to one another. For example, input data and/or output data may be remotely transmitted from one physical location to another. In addition, programs that implement various aspects of this invention may be accessed from a remote location (e.g., a server) over a network. Such data and/or programs may be conveyed through any of a variety of machine-readable medium including magnetic tape or disk or optical disc, or a transmitter, receiver pair.
The present invention may be conveniently implemented with software. However, alternative implementations are certainly possible, including a hardware implementation or a software/hardware implementation. Any hardware-implemented functions may be realized using ASIC(s), digital signal processing circuitry, or the like. Accordingly, the “means” terms in the claims are intended to cover both software and hardware implementations. Similarly, the term “computer-readable medium” as used herein includes software and or hardware having a program of instructions embodied thereon, or a combination thereof. With these implementation alternatives in mind, it is to be understood that the figures and accompanying description provide the functional information one skilled in the art would require to write program code (i.e., software) or to fabricate circuits (i.e., hardware) to perform the processing required.
In accordance with further aspects of the invention, any of the above-described methods or steps thereof may be embodied in a program of instructions (e.g., software), which may be stored on, or conveyed to, a computer or other processor-controlled device for execution on any suitable non-transitory computer-readable medium. Alternatively, any of the methods or steps thereof may be implemented using functionally equivalent hardware (e.g., application specific integrated circuit (ASIC), digital signal processing circuitry, etc.) or a combination of software and hardware. In embodiments, one or more of the methods may be implemented using one or more processing units/systems.
While the inventions have been described in conjunction with several specific embodiments, it is evident to those skilled in the art that many further alternatives, modifications, and variations will be apparent in light of the foregoing description. Thus, the inventions described herein are intended to embrace all such alternatives, modifications, applications and variations as may fall within the spirit and scope of the appended claims.