Object recognition is a subset of the technological discipline of computer vision, whereby a computer is able to extract from an image, information that is necessary to solve a task. In the case of object recognition, the task is identification and classification of pre-specified, or learned, target objects within the image. A classical problem in computer vision is that of determining whether or not the image data contains some specific target object, feature, or activity. This task can conventionally be solved robustly and without effort by a human, despite the fact that the image of the target object(s) may vary somewhat in different viewpoint, size/scale, object translation or rotation, or even where the target object is partially obstructed or obscured in a given image. However, the problem is not fully and satisfactorily solved in computer vision for the non-specific, general case—arbitrary target objects in arbitrary situations. Conventional methods for dealing with this problem can, at best, solve it only for specific target objects; such as simple geometric objects (e.g., polyhedra), human faces, or printed or hand-written characters; and in specific situations, typically described in terms of well-defined illumination, background, and pose, or position and orientation of the target object relative to the camera.
Conventional appearance based methods of object recognition typically use both positive and negative training. Positive training uses example images, or “exemplars,” of the target object in which the target object looks different and/or is presented under varying conditions; for example changes in lighting, changes in the color of the target object, changes in viewing angle/orientation, or changes in the size and/or shape of the target object; to train the machine to recognize the target object. This training is necessarily “domain specific,” it requires training using exemplars in the same category as the target object (e.g. a machine is trained to recognize a car with exemplars of cars).
Negative training uses example images of objects that are not the target object to train the machine to recognize what the target object does not “look like.” Conventional object recognition negative training methods are not domain specific, they do not train using negative exemplars from the same or similar object class (e.g., a machine is not trained to recognize a car by showing it images of only other man made transportation machines such as trains, airplanes, and bicycles). Instead, conventional object recognition negative training proceeds by presenting the computer with an immense breadth of images to teach it what the desired object does not look like (e.g., a machine may be negatively trained to recognize a car by showing it images of negative examples of such varied objects as flowers, staples, fish, forests, bicycles and hats). Training with only one or a few negative samples has been thought unlikely to train a machine to reliably distinguish all iterations of the desired object, driving conventional practitioners to train using large negative sample sets, many members of which are likely irrelevant, significantly driving up the time and cost required for reliable object recognition negative training.
The difficulty in achieving accuracy in object recognition is always related to the selection of positive (+ve) and negative (−ve) samples since these determine the visual patterns the classifier searches for to determine if the object in the image under test contains the object or not. The complexity and sensitivity arises in selecting a representative set of −ve samples that is at once sufficient and as small as possible, because the training process is very costly in terms of computational resources, internet bandwidth, and to a lesser degree storage. The current invention effectively solves all three constraints by minimizing the number of negative samples required to achieve the desired precision and recall rates.