One of the key challenges in computer vision is to detect and classify objects in digital images. It is difficult to detect and classify objects in digital images because (i) there can be considerable variation in the appearance of objects from a same object category (e.g., motorbikes can have different shapes, colors, and textures), (ii) objects can be seen from many different viewpoints and at different scales, and (iii) objects are often surrounded by cluttered backgrounds which makes object detection difficult.
One of existing object classification systems is to learn a codebook of object features and use the codebook for recognition of new instances of objects. Examples of codebook based object detection include an unsupervised generative model for configurations of the codebook words of objects, a shape model to specify where a codebook entry may appear on an object or combination of different detectors and descriptors with a classifier for object detection. However, the object features detected by the existing object classification systems are sparse, which only generate a sparse set of object objects.
Furthermore, existing object classification systems face a variety of other challenges including the requirement of specifying the number of parts during the learning of object models or using motion cues from video sequences. The existing object classification systems have the disadvantage of not being able to generalize and are also computationally expensive for processing digital images having significant scale and viewpoint changes of objects in the images.
The figures depict various embodiments of the invention for purposes of illustration only. One skilled in the art will readily recognize from the following discussion that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles of the invention described herein.