1. Field of the Invention
The present invention generally relates to a method and system for locating a multi-region object in an image or a video database.
2. Description of the Related Art
A challenging problem of content-based retrieval is the retrieval of images or video clips depicting visual patterns or objects. Such localization of images and regions in images containing objects is useful in a number of applications. For example, when the object to be localized is a 2-dimensional (2D) slide pattern representing a projected transparency, such localization can be used for the synchronization of slides and video. This makes for ease of browsing of long videos using slides as reference points.
Similarly, retrieval of images containing objects can allow a more advanced yet natural way of querying the database such as for the detection of similar trademarks. Frequently, the objects depicted in images are composed of multi-colored regions. In the case of slides or transparencies for example, the background region serves as the colored backdrop in which several multi-colored text or graphics regions may lie.
Thus, localization of regions in images that are likely to contain a given object is an essential problem in object recognition and image indexing of databases It has been shown that such region selection can significantly reduce the search involved in object recognition by ranking and focusing the search for an object on likely candidate regions in image.
Region localization is also a key aspect of image indexing of databases, where it is desirable to quickly localize an object query without doing a detailed search of either the image database or regions within images. Often, the objects that need to be localized possess interesting regions such as color regions, whose layout also characterizes object shape in a distinctive manner. It therefore, seems reasonable to use both layout information to perform object localization and matching.
Even though querying based on colored objects seems so natural and straight-forward, it is actually a very difficult problem for the following reasons.
That is, because of the different imaging conditions that include illumination changes, occlusions, and scene clutter, the same object or pattern can appear very different in any image. Further, in many applications, robust localization is desirable even when the object appears in a different pose. Finally, in a large database of images or videos, one cannot afford to search the entire collection. Instead, an xe2x80x9cindexxe2x80x9d of the images is needed for the object to be created, so that only the relevant images or videos that are likely to contain the object are searched.
A few attempts have been made in the past to address this problem. These approaches have two parts, one in which they isolate color regions, and a second one in which they reason about the layout of color regions. The color region information is often described through local color histograms [Das et al, CVPR 98, Matas et al., ICCV95], or by performing a complete bottom-up color image segmentation[Syeda-Mahmood, Proc. ECCV""92]. The predominant way of capturing layout information is through graphs, such as region adjacency graphs[Syeda-Mahmood, Proc. ECCV ""92], color adjacency graphs[Matas, Proc. ICCV 95], spatial proximity graphs [Das et al. CVPR 97], etc. Region localization is then achieved by a variant of subgraph matching using the color of the regions as a first filter, followed by layout constraints such as adjacencies as a second-level filter.
A desirable object localization must be fast, and must account for changes in imaging conditions including appearance changes, illumination changes, occlusions and presence of distractors.
Prior to the present invention, none of the region localization methods mentioned above, has met these challenging requirements. They fail either because of the color region detection and matching being inaccurate, or by inaccurately capturing spatial layout through approximate representations such as proximity layout graphs.
Bottom-up color segmentations give a large number of spurious color regions, while color histograms are sensitive to changes in appearance and illumination. These cause problems in spatial interpretation, when adjacent object regions visually appear adjacent in the image but are not adjacent based on color segmentation.
Region adjacency graphs representations, on the other hand, require more or less accurate segmentation, and since they use subgraph matching, they can become computationally prohibitive even for images with small number of regions.
Thus, while it is desirable to query objects in a database using their color and spatial layout, there exists no technique that can do so in a way that is robust to changes in pose, occlusions, and color changes of objects. Further, these techniques do not support the indexing of spatial layout with the result that all images of a database may have to be examined.
In view of the foregoing and other problems, disadvantages, and drawbacks of the conventional methods and structures, an object of the present invention is to provide a method and structure in which objects can be represented by the spatial layout of their regions in a pose-invariant fashion.
Another object is to provide a method and system for locating multi-colored objects in a database of unsegmented images or video frames using the color and positional arrangement of regions on the object.
Yet another object of the invention is to develop a robust method of object localization that is illumination and pose-invariant as well as tolerant to occlusions and scene clutter.
In a first aspect, a method of locating a multi-featured object in an image or video database is presented that includes identifying the relevant images and the regions within images that are likely to contain a multi-featured query object using a combination of color and geometric layout in an illumination and affine-invariant manner. Further, the technique presented supports indexing of the database so as to avoid a detailed search of every image of the database.
With the unique and unobvious features of the present invention, a method and system are provided for robustly localizing a multi-colored object or pattern in a database of images (or video clips). The colors on the object or images are described using an illumination and pose-invariant color descriptor. The layout of colored regions is described as a region affine structure by using affine-intervals between pairs of regions on the object. The resulting color and shape layout information on images is represented in a compact and efficient data structure called an xe2x80x9cinterval hash treexe2x80x9d The interval hash tree data structure is a general index structure for representing a set of overlapping rectangles, and has been disclosed in the above-mentioned related U.S. patent application Ser. No. 09/593,131, incorporated herein by reference. The present invention makes use of this data structure to represent affine intervals which are rectangles.
A key feature of the invention lies in the modeling of spatial layout of regions on objects in an affine-invariant fashion, and in the development of the technique of region hashing for the robust localization of multi-colored objects in a way that advantageously accounts for changes due to illumination, pose changes and occlusions of objects, and the presence of scene clutter. The indexing present in region hashing also provides for direct localization of relevant images and regions within images containing the object without incurring a detailed search.