As is known in the art, there has been a dramatic growth in the number and size of digital libraries of images. The National Geographic Society and the Louvre, for instance, have transferred much of their extensive collections to digital media. New images are being added to these digital databases at an ever increasing rate. As is known, a digital image is an image which may be represented as a two-dimensional array of pixels with each of the pixels represented by a digital word. With the increase in the number of available digital pictures, the need has arisen for more complete and efficient annotation (attaching identifying labels to images) and indexing (accessing specific images from the database) systems. Digital image/video database annotation and indexing services provide users, such as advertisers, news agencies and magazine publishers with the ability to browse through, via queries to the system, and retrieve images or video segments from such databases.
As is also known, a content based image retrieval system is an image retrieval system which classifies, detects and retrieves images from digital libraries based directly on the content of the image. Content based image processing systems may be used in a variety of applications including, but not limited to, art gallery and museum management, architectural image and design, interior design, remote sensing and management of earth resources, geographic information systems, scientific database management, weather forecasting, retailing, fabric and fashion design, trademark and copyright database management, law enforcement and criminal investigation and picture archiving and communication systems.
Conventional content based image/video retrieval systems utilize images or video frames which have been supplemented with text corresponding to explanatory notes or key words associated with the images. A user retrieves desired images from an image database, for example, by submitting textual queries to the system using one or a combination of these key words. One problem with such systems is that they rely on the restricted predefined textual annotations rather than on the content of the still or video images in the database.
Still other systems attempt to retrieve images based on a specified shape. For example, to find images of a fish, such systems would be provided with a specification of a shape of a fish. This specification would then be used to find images of a fish in the database. One problem with this approach, however, is that fish do not have a standard shape and thus the shape specification is limited to classifying or identifying fish having the same or a very similar shape.
Still other systems classify images or video frames based image statistics including color and texture. The difficulty with these systems is that for a given query image, even though the images returned may have the same color, textural, or other statistical properties as the example image, they might not be part of the same class as the query image. Such systems however are unable to encode global scene configurations. That is such systems are unable to encode the manner in which attributes such as color and luminance are spacially distributed over an image.
It would thus be desirable to provide a technique which may be used in a general automated scene classification and detection system and which allows the encoding of global context in the class model. These models may be subsequently used for image classification and retrieval from a database. It would be particularly desirable to have the system be capable of automatically learning such class models from a set of example images specified by a user.