This invention relates to image processing systems and more particularly to content based image classification and identification systems.
As is known in the art, there has been a dramatic growth in the number and size of digital libraries of images. The National Geographic Society and the Louvre, for instance, have transferred much of their extensive collections to digital media. New images are being added to these digital databases at an ever increasing rate. As is known, a digital image is an image which may be represented as a two-dimensional array of pixels with each of the pixels represented by a digital word. With the increase in the number of available digital pictures, the need has arisen for more complete and efficient annotation (attaching identifying labels to images) and indexing (accessing specific images from the database) systems. Digital image/video database annotation and indexing services provide users, such as advertisers, news agencies and magazine publishers with the ability to browse through, via queries to the system, and retrieve images or video segments from such databases.
As is also known, a content based image retrieval system is an image retrieval system which classifies, detects and retrieves images from digital libraries based directly on the content of the image. Content based image processing systems may be used in a variety of applications including, but not limited to, art gallery and museum management, architectural image and design, interior design, remote sensing and management of earth resources, geographic information systems, scientific database management, weather forecasting, retailing, fabric and fashion design, trademark and copyright database management, law enforcement and criminal investigation and picture archiving and communication systems.
Conventional content based image/video retrieval systems utilize images or video frames which have been supplemented with text corresponding to explanatory notes or key words associated with the images. A user retrieves desired images from an image database, for example, by submitting textual queries to the system using one or a combination of these key words. One problem with such systems is that they rely on the restricted predefined textual annotations rather than on the content of the still or video images in the database.
Still other systems attempt to retrieve images based on a specified shape. For example, to find images of a fish, such systems would be provided with a specification of a shape of a fish. This specification would then be used to find images of a fish in the database. One problem with this approach, however, is that fish do not have a standard shape and thus the shape specification is limited to classifying or identifying fish having the same or a very similar shape.
Still other systems classify images or video frames based image statistics including color and texture. The difficulty with these systems is that for a given query image, even though the images returned may have the same color, textural, or other statistical properties as the example image, they might not be part of the same class as the query image. Such systems however are unable to encode global scene configurations. That is such systems are unable to encode the manner in which attributes such as color and luminance are spacially distributed over an image.
It would thus be desirable to provide a technique which may be used in a general automated scene classification and detection system and which allows the encoding of global context in the class model. These models may be subsequently used for image classification and retrieval from a database. It would be particularly desirable to have the system be capable of automatically learning such class models from a set of example images specified by a user.
In accordance with the present invention, an image processing system and method of operation within such a system are disclosed. The image processing system utilizes a class model defined by one or more relative relationships between a plurality of image patches. The relative relationships are encoded in a global deformable template. This template provides the information which describes the overall organization of images within an image class. The class model may be pre-defined or generated by an image processing system.
In one aspect of the present invention a method of generating a class model includes the steps of selecting a first image region from a plurality of image regions and identifying a first relative relationship between a property of a first one of the plurality of image regions, and a like property of a second one of the plurality of image regions. The first image and second image regions and the first relationship between them may be encoded in a deformable template by storing an elastic or flexible connection between the two regions. With this particular arrangement, a class model, which can be used to detect images of that class in a data base is provided. The class model may be stored in a storage device of an image processing system. The particular properties and relationships included in the model may be selected in accordance with a variety of factors including but not limited to the class of images to which the model is to be applied. For example, in one particular embodiment relating to classification of images of environmental or natural scenes, the relative relationships between the plurality of image patches correspond to relative spatial and photometric relationships. Consistent relationships between a plurality of like example images may be identified to provide a model defined by a plurality of image patches and a plurality of relative spatial and photometric relationships between the images patches. The model may be encoded in the form of a deformable template.
In accordance with a further aspect of the present invention, an image model for classifying or detecting images includes a plurality of image patches coupled by a plurality of relative image patch property relationships. The patch properties may correspond to properties including but not limited to one or more of spatial, color, photometric, texture, luminance and shape properties of the image patches. It should be noted that the image model is not a model of any particular object, rather it is a model which represents relative relationships, such as relative spatial and photometric relationships within an image. The model may be used in a configural recognition system. The configural recognition system utilizes the context or overall configuration of an image for identification and/or classification of an image. This differs from conventional approaches in which image understanding is based at least in part on recognizing individual salient image sub-parts. The configural recognition technique of the present invention utilizes image models which are provided from a global arrangement of sets of qualitative relationships (such as spatial and photometric relationships) between regions of an image. The image may be provided as a low resolution image. The image models are implemented as deformable templates having a plurality of image patches which are related by a plurality of relative relationships. With this configural recognition approach, the technique of the present invention can be used in a plurality of applications including but not limited to scene classification, face recognition and understanding of biological motion.
The configural recognition technique of the present invention proceeds without the need for complex abstractions past the image. The use of low resolution images derived from low frequency images allows a reduction in the complexity of the problem and shifts the focus of the image away from potentially confusing image details provided by high frequency image data. Even though example images in a single image class may differ in absolute color and spatial arrangement, a class model, a scene concept which represents all of the images in a class is provided by using qualitative spatial and photometric relationships. Finally, by encoding in a deformable template an overall structure of selected image regions and their relative relationships, the system is able to capture the global organization of the class. In the detection process only globally consistent solutions are considered, thus, reducing the amount of processing by the system and increasing its robustness.
In a still further aspect of the present invention, an image may be divided into a plurality of image regions and groups of image regions may be formed based on their attributes and relative relationships to other image regions. Image models may then be formed from properties of the groups of image regions.