Computer vision techniques are increasingly used to automatically detect or classify objects or events in images. The ability to differentiate among objects is an important task for the efficient functioning of many computer vision systems. For example, in certain applications it is important for a computer vision system to distinguish between animate objects, such as people and pets, and inanimate objects, such as furniture and doors. Pattern recognition techniques, for example, are often applied to images to determine a likelihood (probability) that a given object or class of objects appears in the image. For a detailed discussion of pattern recognition or classification techniques, see, for example, R. O. Duda and P. Hart, Pattern Recognition and Scene Analysis, Wiley, New York (1973); R. T. Chin and C. R. Dyer, “Model-Based Recognition in Robot Vision,” ACM Computing Surveys, 18(1), 67–108 (March, 1986); or P. J. Besl and R. C. Jain, “Three-Dimensional Object Recognition,” Computing Surveys, 17(1), 75–145 (March, 1985), each incorporated by reference herein.
Appearance based techniques have been extensively used for object recognition because of their inherent ability to exploit image based information. Appearance based techniques attempt to recognize objects by finding the best match between a two-dimensional image representation of the object appearance and stored prototypes. Generally, appearance based methods use a lower dimensional subspace of the higher dimensional representation for the purpose of comparison. Common examples of appearance based techniques for recognition and classification of objects include Principle Component Analysis (PCA), Independent Component Analysis (ICA) and Neural Networks.
U.S. patent application Ser. No. 09/794,443, filed Feb. 27, 2001, entitled “Classification of Objects Through Model Ensembles,” and T. Brodsky et al., “Visual Surveillance in Retail Stores and in the Home,” Proc. 2nd European Workshop on Advanced Video-Based Surveillance Systems, 297–310 (2001), disclose an object classification engine that distinguishes between people and pets in a residential home environment. Initially, speed and aspect ratio information are used to filter out invalid moving objects, such as furniture. Thereafter, gradient images are extracted from the remaining objects and applied to a Radial Basis Function Network (RBFN) to classify moving objects as people or pets.
While currently available classification schemes perform well in a closed environment, such as a residential home, they suffer from a number of limitations, which if overcome, could greatly improve the ability of such classification schemes to classify unknown objects. In particular, while most conventional classification schemes exploit known information about the form or function of these objects, few, if any, classification schemes currently attempt to build object category hierarchies using purely image-based information.