Automatic scene classification of images may be used in a wide variety of applications, including computer vision systems and media asset organization and retrieval systems. Many automatic scene classification systems classify images based at least in part on a content-based analysis of the images. Each image typically is represented by a set of low-level features (e.g., texture, color, and shape) that are extracted from the image. The images are classified by applying the corresponding features into a classifier, such as a Support Vector Machine, which has been trained on pre-labeled images in a target scene type class (e.g., indoor/outdoor, city/landscape, and sunset/mid-day). Based on the input features, the classifier determines whether or not new image instances should be classified into the target scene type class.
Some automatic scene classification systems augment content-based scene classification information with metadata that is associated with the images to improve the accuracy with which images are classified into various scene type classes. Such metadata corresponds to structured data that describes information relating to the images, such as characteristics of the images or conditions occurring at or near the times the images were captured. Most digital cameras, for example, encode camera metadata in the header (typically an EXIF header) of the image file containing the corresponding image data. Exemplary types of camera metadata include data/time stamps, whether or not a flash was used, focal length, exposure time, aperture value, subject distance, and brightness value.
In one automatic scene classification approach, a final estimate of image class is produced from a combination of a metadata-based estimate of image class and a content-based estimate of image class. In this approach, the content-based estimate of image class is obtained by extracting a plurality of color and texture features from image sub-blocks, inputting the features into a Support Vector Machine that generates estimates of image class for each of the sub-blocks, and combining these estimates to produce an overall content-based classification estimate for the entire image.
In general, the content-based scene classification methods described above are computationally intensive and require significant memory resources, making them unsuitable in application environments, such as embedded environments, in which processing and memory resources are significantly constrained. In addition, many of the low level features used in these methods cannot be determined for certain types of scenes, such as night scenes and snow scenes.
What are needed are systems and methods of classifying images into targeted scene type classes that may be implemented within the significant processing and memory constraints of typical embedded application environments. In addition, it would be desirable to have systems and methods that are capable of classifying images into scene type classes for which low-level feature descriptors cannot be determined.