The proliferation of digital cameras and scanners has lead to an explosion of digital images, creating large personal image databases. The organization and retrieval of images and videos is already a problem for the typical consumer. Currently, the length of time spanned by a typical consumer's digital image collection is only a few years. The organization and retrieval problem will continue to grow as the length of time spanned by the average digital image and video collection increases, and automated tools for efficient image indexing and retrieval will be required.
Many methods of image classification based on low-level features such as color and texture have been proposed for use in content-based image retrieval. A survey of low-level content-based techniques (“Content-based Image Retrieval at the End of the Early Years,” A. W. M. Smeulders et al, IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(12), December 2000) provides a comprehensive listing of relevant methods that can be used for content-based image retrieval. The low-level features commonly described include color, local shape characteristics derived from directional color derivatives and scale space representations, image texture, image transform coefficients such as the cosine transform used in JPEG-coding and properties derived from image segmentation such as shape, contour and geometric invariants. Though these features can be efficiently computed and matched reliably, they usually have poor correlation with semantic image content.
There have also been attempts to compute semantic-level features from images. In WO 01/37131 A2, visual properties of salient image regions are used to classify images. In addition to numerical measurements of visual properties, neural networks are used to classify some of the regions using semantic terms such as “sky” and “skin.” The region-based characteristics of the images in the collection are indexed to make it easy to find other images matching the characteristics of a given query image. U.S. Pat. No. 6,240,424 B1, discloses a method for classifying and querying images using primary objects in the image as a clustering center. Images matching a given unclassified image are found by formulating an appropriate query based on the primary objects in the given image. U.S. Patent Application Publication No. 2003/0195883 A1 computes an image's category from a pre-defined set of possible categories, such as “cityscapes.”
These semantic-level features are also not the way users recall and search for images in their collection. Users' recollection of photographs is often based on the event that was captured. For example, photographs may be identified as “Grand Canyon vacation,” “Mom's birthday party,” “Joe's baseball league” and so on. There are mechanisms available in current software to manually enter such tags or captions to identify photographs. However, a need exists to automate this labor-intensive process, so that a user is able to search by common types of events without having to tag the images first. Further, the user can combine event type with other semantic features such as people present in the image, location or activity to narrow the search to relevant images.