The invention relates generally to the field of digital image processing, and in particular to digital image processing techniques relating to the identification and representation of shape components of an image.
Rapid advances in digital imaging are leading to an explosion of large collections of digital images. Once digital imaging truly takes hold in the home consumer market, practically every household will have a collection a digital images. If tradition holds, most of these collections (being archived to tape or CDROM) will end up, e.g., stored in a shoebox on a closet shelf, rarely, if ever, to be viewed again. However, digital imaging offers the ability to prevent this outcome in ways never possible with analog imaging. For example, computer applications can be created to assist in the formation of album pages, story telling, photoquilts, etc. The success of these applications depends on the ease by which consumers can access their images. If they have to randomly look through the collection of digital images, or worse yet, through the index prints on the CD case containing the digital image CDs, for the images they want, then they will quickly throw up their hands in frustration and never use the collection again. However, if a computer was to automatically organize their images, based on each image""s content, then retrieval would be simple, fast and effective.
The underlying technology that will be common across all these applications combines the tools of digital image processing with those of database management. The digital image processing tools extract information from the image that provides a compact representation of an image""s content. The database management tools provide organizational structures for fast, effective retrieval of images based on their extracted content representation. Currently known database technologies are disclosed in International Application No. WO 9,852,119 (xe2x80x9cFeature and Region Based Method for Database Image Retrievalxe2x80x9dxe2x80x94which involves classifying images by features and region parameters, and searching a database for images within some threshold of the request), European Patent No. 872,803 (xe2x80x9cImage Processing Apparatus for Managing Image Dataxe2x80x9d) and U.S. Pat. No. 5,852,823 (xe2x80x9cImage Classification and Retrieval System Using a Query-by-Example Paradigmxe2x80x9d). Current technologies for image content extraction and representation allow for content characterization in terms of low level features: global color, color composition, texture, and shape. The present invention addresses the issue of shape-based image content representation, organization and retrieval. None of the above-cited database references address the issue of shape.
In general, the definition for the task of shape-based image retrieval is as follows:
Given a query image, retrieve images within the database whose regions have a shape similar to those of the query.
There are three significant issues that must be addressed when developing a shape-based retrieval method:
Shape Representation: encoding the shape information in a form useful for organization, similarity determination, and efficient storage and retrieval.
Similarity Measure: producing results that are consistent with human visual perception. This measure is highly dependent upon the shape representation.
Index Structure: providing the organizational capabilities of the representation for efficient retrieval. The type of index depends upon the shape representation being used.
Existing solutions to the problem of image retrieval based on shape have addressed these issues in different ways. NETRA (W.Y. Ma, xe2x80x9cNetra: A Toolbox for Navigating Large Image Databases,xe2x80x9d Ph.D. Thesis, Dept. of Electrical and Computer Engineering, University of California, Santa Barbara, 1997) uses Fourier Descriptors as the shape representation. As is well known, Fourier Descriptors are not invariant to scale, translation, rotation and starting point, and therefore, must be normalized. In NETRA, the rotation and starting point normalization is achieved by throwing away the phase information (rotation and starting point only affect the phase). Scale invariance is achieved by dividing the magnitudes by the magnitude of the lowest frequency component. The Euclidean distance metric is used to measure the similarity of the normalized magnitudes. NETRA does not address the issue of indexing.
In the article xe2x80x9cSimilar Shape Retrieval Using Structural Feature Indexxe2x80x9d (J. E. Gary, R. Mehrotra, Information Systems, Vol. 18, No. 7, 525-537, 1993), the shape is represented as the structural components of the shape""s boundary. The structural components are normalized and organized in a point access method index. Similarity is determined through a correspondence measure between the query structural component and the component retrieved through an index search.
The general approach used in Photobook (A. Pentland, R. W. Picard, S. Sclaroff, xe2x80x9cPhotobook: Tools for Content-Based Manipulation of Image Databases,xe2x80x9d SPIE, Vol. 2185, 34-47) is identified as semantics-preserving image compression, i.e., compact representations that preserve essential image similarities. Their choice for shape is the Finite Element Method models of objects described in the article xe2x80x9cModal Matching for Correspondence and Recognitionxe2x80x9d (S. Sclaroff, A. Pentland, M.I.T. Media Laboratory Perceptual Computing Section Technical Report No. 201, 1993). The representation is modes of free vibration of the finite element model of the selected feature points of the shape. Similarity is measured in terms of the deformation energy required to match the query shape to the database shape. Subsequent work developed a method for organizing the representations (S. Sclaroff, xe2x80x9cDeformable Prototypes for Encoding Shape Categories in Image Databases,xe2x80x9d Pattern Recognition, Vol. 30, No. 4, 627-641, 1997).
In the article xe2x80x9cA Content-based Image Retrieval Systemxe2x80x9d (C. Huang, D. Huang, Image and Vision Computing, Vol. 16, 149-163, 1993), the shape description comprises moments and Fourier Descriptors of the shape""s boundary and feature points (unction and curvature) to capture internal structure. Retrieval is done in two phases. First, the moments and Fourier Descriptors of the query shape""s boundary is compared using a city block distance measure to every image in the database and the top twenty candidates are chosen. Final similarity is determined through a complex hash table of the feature points.
The QBIC system (which is described in U.S. Pat. No. 5,579,471) initially represented shape through a combination of heuristic features such as area, circularity, eccentricity, major axis orientation, and a set of algebraic moment invariants. Similarity is judged through the Euclidean distance metric. As discussed in U.S. Pat. No. 5,579,471, similar heuristic features and moments do not guarantee perceptually similar shapes. More sophisticated shape descriptors were explored in the article xe2x80x9cRetrieving Image by 2D Shape: A Comparison of Computation Methods with Human Perceptual Judgments,xe2x80x9d (B. Scassellati, S. Alexopoulos, M. Flickner, Proc. SPIE Storage and Retrieval for Image and Video Databases II, San Jose, Calif., 2-14,1994), such as parametric curve distance, turning angle, sign of curvature, and a modified Hausdorf distance, with no decisive result.
In the article xe2x80x9cShape-Based Retrieval: A Case Study with Trademark Image Databases,xe2x80x9d (A. K. Jain, A. Vailaya, Pattern Recognition, Vol. 31, No. 9, 1369-1390, 1998), the retrieval process comprises two phases with a different shape representation in each phase. In the first phase, xe2x80x9cfast pruningxe2x80x9d, the shape representation comprises edge angles (a histogram of edge directions) and moment invariants. This representation of the query shape is compared against all images in the database in order to retrieve the top ten candidates to pass on to the next phase. The second phase improves the similarity ranking by employing deformable templates. Again, they reported difficulty agreeing with human subjective results.
In the article xe2x80x9cParts of Recognitionxe2x80x9d (D. D. Hoffmnan, W. A. Richards, Cognition, Vol. 18, 65-96, 1985), Hoffmnan and Richards advocated that shape decomposition should precede shape description. The challenge here is to insure that the decomposition scheme is invariant under rigid transformations, robust to local deformations, stable under global transformations, and produces components consistent with human perception. Their formulation leads to a boundary-based decomposition scheme that divides a plane curve into segments at negative curvature minima. They further developed this concept into a full representation scheme they referred to as codons: boundary segments that are bounded by negative curvature minima, described by curvature maxima and zeros, and subsequently classified into one of six possible types. In the article xe2x80x9cParts of Visual Objects: An Experimental Test of the Minima Rulexe2x80x9d (M. L. Braunstein, D. D. Hoffman, A. Saidpour, Perception, Vol. 18, 817-826, 1989), Braunstein et al. present psychophysical evidence supporting the concepts behind codons. In the article xe2x80x9cParts of Visual Form: Computational Aspectsxe2x80x9d (K. Siddiqi, B. B. Kimia, IEEE PAMI, Vol. 17, 239-251, 1995), Siddiqi and Kimia point out many situations for which codons fail. They argue that this is due to the method being limited to analyzing only the boundary and, therefore, ignoring the interior of the shape. Instead, they propose a scheme, motivated by the general principle of xe2x80x9cform from functionxe2x80x9d, that decomposes the shape into neck-based and limb-based components. They show that this decomposition scheme is robust under rigid and local deformations, and stable under minor global transformations. Their experiments on test shapes with human generated ground truth demonstrate the ability of the neck-limb based decomposition scheme to mimic a human""s perception of the shape""s components.
An object of the invention is the retrieval of images containing regions with shape characteristics similar to a query shape.
The present invention is directed to overcoming one or more of the problems set forth above. Briefly summarized, according to one aspect of the present invention, A method for representing an image in terms of the shape properties of its identified segments of interest comprises the steps of: (a) providing a digital image with identified segments of interest; (b) analyzing a segment of interest to automatically identify one or more of its perceptually significant components; (c) representing a perceptually significant component in terms of its shape properties, thereby providing a shape representation; (d) characterizing an image segment as a composition of the shape properties of its perceptually significant components, thereby providing a characterized image segment; and (e) repeating steps (b) to (d) to represent the image as a composition of its characterized segments.
The technique presented according to the invention assumes the input to be an image segmented into regions of interest. Each segment of interest is automatically analyzed to identify its perceptually significant components. In this invention, this is accomplished with an adaptive morphological filter that removes perceptually insignificant shape information. The shapes of these components are then characterized, e.g., with normalized Fourier Descriptors, which characterizes the region""s shape in terms of the frequency content of its boundary. Since an image segment may have more than one perceptually significant component, the programmatic shape representation of the segment is the composition of the shape representations of each perceptually significant component. Each perceptually significant component of each image segment is stored and indexed for efficient retrieval of candidate shapes in response to a query, e.g., an R-Tree is used to index keys generated from the normalized Fourier Descriptors. The process of determining the similarity of two image segments comprises the comparison of each of the segment""s perceptually significant components, e.g., with Fourier Descriptors as the shape characterization scheme, similarity is determined with a Euclidean Distance metric and Fourier Descriptor normalization error compensation.
The advantage of the invention is that it provides a simple, fast and effective method for the archival and retrieval of images based on the shape properties of perceptually significant components of identified segments in an image. The centerpiece and principal advantage of this invention is the automatic identification and representation of perceptually relevant shape components of the identified image segments. The process of comparing segments of two images comprises the determination of the similarity of the perceptually relevant shape components of the segments. These are combined together into a system for retrieving images from a database that contains segments with shape properties similar to those of a query image or sketch. This method will find application in any product wishing to provide image archival and retrieval capabilities where the computer will automatically organize images, based on the image""s content.
These and other aspects, objects, features and advantages of the present invention will be more clearly understood and appreciated from a review of the following detailed description of the preferred embodiments and appended claims, and by reference to the accompanying drawings.