Automatically determining the semantic classification (e.g., indoor, outdoor—sunset, picnic, beach) of an arbitrary image is a difficult problem. Much research has been done recently, and a variety of classifiers and feature sets have been proposed. The most common design for such systems has been to use low-level features (e.g., color, texture) and statistical pattern recognition techniques. Such systems are exemplar-based, relying on learning patterns from a training set. Examples are M. Szummer and R. W. Picard, “Indoor-outdoor image classification”, in Proceedings of IEEE Workshop on Content-based Access of Image and Video Databases, 1998, and A. Vailaya, M. Figueiredo, A. Jain, and H. J. Zhang, “Content-based hierarchical classification of vacation images”, in Proceedings of IEEE International Conference on Multimedia Computing and Systems, 1999.
Semantic scene classification can improve the performance of content-based image organization and retrieval (CBIR). Many current CBIR systems allow a user to specify an image and search for images similar to it, where similarity is often defined only by color or texture properties. This so-called “query by example” has often proven to be inadequate due to its simplicity. Knowing the category of a scene a priori helps narrow the search space dramatically. For instance, knowing what constitutes a party scene allows us to consider only party scenes in our search to answer the query “Find pictures of Mary's birthday party”. This way, the search time is reduced, the hit rate is higher, and the false alarm rate is expected to be lower.
Classification of unconstrained consumer images in general is a difficult problem. Therefore, it can be helpful to use a hierarchical approach, in which classifying images into indoor or outdoor images occurs at the top level and is followed by further classification within each subcategory, as suggested by Vailaya et al.
Still, current scene classification systems often fail on unconstrained image sets. The primary reason appears to be the incredible variety of images found within most semantic classes. Exemplar-based systems must account for such variation in their training sets. Even hundreds of exemplars do not necessarily capture all of the variability inherent in some classes.
Consequently, a need exists for a method that overcomes the above-described deficiencies in image classification.
While the advent of digital imaging created an enormous number of digital images and thus the need for scene classification (e.g., for use in digital photofinishing and in image organization), it also brings with it a powerful source of information little-exploited for scene classification: camera metadata embedded in the digital image files. Metadata (or “data about data”) for cameras includes values such as date/time stamps, presence or absence of flash, exposure time, and aperture value. Most camera manufacturers today store metadata using the EXIF (EXchangeable Image File Format) standard (http://www.exif.org/specifications.html).