Data structures, such as two dimensional pixel arrays, are being generated at an ever increasing rate. For instance, algorithm generated and scanned computer screen images, X-ray, CT, MRI and NASA satellite, space telescope and solar explorer systems generate thousands of images every day. To make optimum use of said images, however, convenient methods of data characterization, storage and retrieval are required. For example, a medical doctor might obtain an X-ray image of a patient's chest but has to rely on "diagnostic art" to arrive at a diagnosis. Were it possible to determine an index which characterizes said X-ray image and also enable easy storage and retrieval thereof, it would be possible to compare said index to a catalog of indices of various X-ray images which are known to be associated with various healthy or pathologic conditions. Thus diagnosis could be moved toward the very desirable goal of being objectively definite in a mathematical sense.
Continuing, it must be understood that conventional data bases are stored as text with organization being in terms of fields and values. Examples are business product, customer lists, sales data etc. To retrieve such data a user must issue a query in text format, similar to what is done in natural languages. It is essentially impossible to use such an approach to store and retrieve the contents of most data images, for example, because there is no convenient manageable way to describe such data images in terms of said fields and values. Data Images are instead typically stored in the form of compressed digital files of hundreds of thousands of binary numbers, and said storage technique does not facilitate easy image characterization, storage and retrieval. And, while it is possible to describe a data image with a text Index, to examine the data image data still requires that the data associated with said Index be retrieved. It is also possible to assign an arbitrary serial number to a data image to facilitate data storage and retrieval, but under this approach the serial number provides no insight to the image and again, to examine data image, requires accessing the image data per se.
A preferred approach to the characterization of data images, which provides an index for use in storage and retrieval thereof, is to base the index on features in the data image. To arrive at such an index, however, is typically computationally complex, requiring hundreds of thousands of calculations. That is, determination of said index must typically be extracted from a data image "off-line". Characteristic indices so determined are called "image indices", and ideally render a concise description, not only of an image color and intensity content on a row and column basis, but also of the nature and shape of objects therein. A problem arises, however, in that many image features can not be easily described. Geometric shapes in a data image, for example, can require a combination of text annotation and numeric values and often the result is not at all concise.
Relevant considerations in developing an approach to extracting "image indices" from a data image or data set include:
1. Uniqueness--different images/sets should have different associated image indices, (ie. an image index should be non-degenerate); PA1 2. Universality--image/set indices must be extractable from essentially any kind of image to be characterized, stored and retrieved by use thereof; PA1 3. Computation--image/set indices must be easily computed from any data image to be characterized, stored and retrieved by use thereof; PA1 4. Conciseness--image/set indices must concise and easy to store; PA1 5. Invariance--descriptive features in a data image/set must tolerate change of scale, rotation and translation transformations, image object position shifting, calibration of color and pixel intensity and return essentially unchanged image indices; PA1 6. Noise resistant--random noise entry to image/set data should not significantly change the image index extracted therefrom. PA1 pixel intensity and color distributions, (see an article titled "Query By Image And Video Content: The QBIC System)", IEEE Trans. on Computers, (Sep. 1995)); PA1 pixel texture patterns (see a book titled "Digital Image Processing", Gonzales, Addison-Wesley Pub. (1992)); and PA1 edge and boundary-line shapes, (see a book titled "Digital Image Processing And Computer Vision", Schalkoff, John Wiley & Sons, (1989)), PA1 etc. as the basis of approach. These techniques are mainly based on the calculation of the statistics of a data image in a pixel arrangement. Said techniques often lack Universality in that they work when applied to a certain type of data image, but not when applied to other types of data images. Moreover, many previous approaches are not image transformation invariant and do not tolerate entry of noise. PA1 "Image Analysis Via the General Theory Of Moments", Teague, J. Opt. Soc. America, Vol. 70, No. 8, (Aug. 1980), which discloses that a 2D shape obtained from moment invariants defined on the second central moments can be viewed as an elliptic approximation of the shape; and PA1 "A Transformation-Invariant Recursive Subdivision Method For Shape Analysis", Zhu and Poh, IEEE Proc. of the 9th Int. Conf. on Pattern Recog., Rome, Italy, (Nov. 14-17, 1988). PA1 a. determining Eigenvalues for essentially the entire data set, said Eigenvalues being a major axis and (N-1) minor axes of a characteristic virtual data set mathematical object, then calculating a first non-degenerate data set index element using a formula which operates on said major axis and at least one of said (N-1) minor axes; PA1 b. dividing said essentially entire "N" dimensional data set into at least first and second data set parts about at least one axis selected from the group consisting of: (said major axis and said (N-1) minor axes), and for at least one of said at least first and second data set parts independently determining "N" Eigenvalues therefore, said "N" Eigenvalues being a major axis and (N-1) minor axes of a mathematical object for said at least one of said at least first and second data set parts, and then calculating at least one additional non-degenerate data set index element using formula(s) which operate on said major axis and at least one of said (N-1) minor axes in said at least one of said at least first and second data set parts, and return a non-degenerate result; and PA1 c. concatenating at least two resulting non-degenerate data set index elements in any functional order to provide said identifying data set index (I). PA1 a. determining Eigenvalues for essentially the entire data image, said Elgenvalues being a major axis and a minor axis of a characteristic virtual ellipse, then calculating a first (I1) non-degenerate data image index element using a formula which operates on said major and minor axes Eigenvalues; PA1 b. dividing said essentially entire two dimensional data image into at least first and second data image parts about an axis selected from the group consisting of: (said minor axis and said major axis), and for each of said at least first and second data image parts independently determining Eigenvalues therefore, said Eigenvalues being a major axis and a minor axis for a first of said separate characteristic virtual ellipses, and major axis and a minor axis for a second of said separate characteristic virtual ellipses, and optionally independently determining a major axis and a minor axis for at least some of any additional data image parts, and then calculating at least second (I2) and/or third (I3) non-degenerate data image index elements determined from two of said at least two data image parts, using formulas which return a non-degenerate result; PA1 c. concatenating at least two of said first (I1), second (I2) and/or third (3) non-degenerate data image index elements in any functional order to provide said identifying data image index (I). PA1 a. determining Eigenvalues for essentially the entire data image, said Eigenvalues being a major axis (.lambda..sub.11) and a minor axis .lambda..sub.21) of a characteristic virtual ellipse, then calculating a first (I1) non-degenerate data image index element using the formula: ##EQU3## PA1 b. dividing said essentially entire two dimensional data image into first and second data image parts about an axis selected from the group consisting of: (said minor axis and said major axis), and for each of said first and second data image parts independently determining Eigenvalues therefore, said Eigenvalues being a major axis .lambda..sub.12 and a minor axis (.lambda..sub.22) for the first of said separate characteristic virtual ellipses, and major axis (.lambda..sub.13) and a minor axis (.lambda..sub.23) for the second of said separate characteristic virtual ellipses, and then calculating second (I2) and third (I3) non-degenerate data image index elements using the formulas: ##EQU4## PA1 c. concatenating said first, second and third non-degenerate data image index elements to provide said identifying data image index (I) by a selection from the group consisting of: PA1 I=I1 I2 I3; PA1 I=I113 I2; PA1 I=I2 I1 I3; PA1 I=I2 I3 I1; PA1 I=I3 I1 I2; and PA1 I=I3 I2 I1. PA1 a. determining Eigenvalues for essentially the entire data image, said Eigenvalues being a major axis (.lambda..sub.11) and a minor axis (.lambda..sub.21) of a characteristic virtual ellipse, then calculating a first (I1) non-degenerate data image index element using the formula: ##EQU5## PA1 b. dividing said essentially entire two dimensional data image into first and second data image parts about an axis selected from the group consisting of: (said minor axis and said major axis), and for each of said first and second data image parts independently determining Eigenvalues therefore, said Eigenvalues being a major axis (.lambda..sub.12) and a minor axis (.lambda..sub.22) for the first of said separate characteristic virtual ellipses, and major axis (.lambda..sub.13) and a minor axis (.lambda..sub.23) for the second of said separate characteristic virtual ellipses, and then calculating second (I2) and third (I3) non-degenerate data image index elements using the formulas: ##EQU6## PA1 c. dividing said each of said first and second image parts from step b., each about an axis selected from the group consisting of: (said minor axis and said major axis thereof), to produce third, forth, fifth and sixth image parts and for at least one of said third, forth, fifth and sixth image parts independently determining Eigenvalues thereof, said determined Eigenvalues being selected from the group consisting of: PA1 d. concatenating said first (I1), and at least one produced non-degenerate data index element(s) selected from the group consisting of said: (second (I2), third (I3), forth (I4), fifth (I5), sixth (I6) and seventh (I7) non-degenerate data index elements), in any functional order, to provide said identifying data image index (I). PA1 a. determining Eigenvalues for essentially the entire data image, said Eigenvalues being a major axis (.lambda..sub.11) and a minor axis (.lambda..sub.21) of a characteristic virtual ellipse, then calculating a first (I1) non-degenerate data image index element using a formula which operates on said Eigenvalues: PA1 b. dividing said essentially entire two dimensional data image into first, second, third and forth data image parts using said minor axis and said major axis as dividing means, said first, second, third and forth data image parts being oriented in a first, second, third and forth quadrant pattern defined by said major and minor axes, in said two dimensional data image; PA1 c. for at least one of said first, second, third and forth data image parts independently determining Eigenvalues of a characteristic virtual ellipse therefore, said Eigenvalues being selected from the group consisting of: PA1 d. calculating at least one additional non-degenerate data image index element using formula(s) which operates on Eigenvalues corresponding to said at least one of said first, second, third and forth data image parts determined in step. c; and PA1 e. concatonating at least two resulting non-degenerate data index elements, in any functional order, to provide said identifying data image index (I). PA1 a major axis (.lambda..sub.12) and a minor axis (.lambda..sub.22) for the first of said separate characteristic virtual ellipses, and a major axis (.lambda..sub.13) and a minor axis (.lambda..sub.23) for the second of said separate characteristic virtual ellipses, a major axis (.lambda..sub.14) and a minor axis (.lambda..sub.24) for the third of said separate characteristic virtual ellipses, and a major axis (.lambda..sub.15) and a minor axis (.lambda..sub.25) for the forth of said separate characteristic virtual ellipses; PA1 e. concatenating said resulting five non-degenerate data index elements, in any functional order, to provide said identifying data image index (I). PA1 a. determining Eigenvalues for essentially the entire data image, said Eigenvalues being a major axis and a minor axis of a characteristic virtual ellipse, then calculating a first (I1) non-degenerate data image index element using a formula which operates on said major and minor axes Eigenvalues; and PA1 b. comparing said first (I1) non-degenerate data image index element for said first data set to that for said second data set. PA1 b. dividing said essentially entire two dimensional data image into at least first and second data image parts about an axis selected from the group consisting of: (said minor axis and said major axis), and for each of said at least first and second data image parts independently determining Eigenvalues therefore, said Eigenvalues being a major axis and a minor axis for a first of said separate characteristic virtual ellipses, and major axis and a minor axis for a second of said separate characteristic virtual ellipses, and optionally independently determining a major axis and a minor axis for at least some of any additional data image parts, and then calculating at least second (I2) and/or third (I3) non-degenerate data image index elements determined from two of said at least two data image parts, using formulas which return a non-degenerate result; and PA1 c. comparing at least said second (I2) non-degenerate data image index element for said first data set to that for said second data set.
Previous attempts at extracting an image index for image/set data have focused on use of:
Continuing, one approach which provides a rotationally invariant result is termed "Equal Angular Sampling". Said method provides a concatenation of numbers which are distances from a centroid in a data image to an intersection point with an object boundary. Said technique encounters problems, however, where objects with irregular shapes, with concave boundaries and/or wherein holes are encountered.
The use of Moment Invariants to describe the geometrical shape features of data images was proposed more than thirty (30) years ago by Hu in an article titled "Visual Pattern Recognition By Moment Invariants", IRE Trans. on Information Theory, IT-8, (February 1963). The method is based in modeling an image as a physical object with masses distributed in two dimensional space. It typically treats the pixel intensities as the probability distribution value of the object masses. The central moments in various orders are calculated on distributions. A set of moment invariants is derived from making algebraic combinations of the moments. The most important property of he technique is that the resulting descriptive quantities are transformation invariant, (ie. the moment invariants remain unchanged when the image undergoes scaling, rotation, translation, intensity, or color platter changes). See an article titled "Recognitive Aspects Of Moment Invariants", by Abu-Mostafa et al., IEEE Trans. on Pattent Analysis and Mach Intell., Vol. PAMI-6, No. 6, (November 1984).
Additional references of interest are:
Continuing, it is to be appreciated that Statistical and Moment-based descriptions of data can distinguish data images at only very rough levels. That is, an image index associated with a data image is not unique and could be arrived at by analysis of an alternative data image. In addition, the computations involved in practicing said Statistics and Moment-based approaches can be complicated and time consuming and can require both character and numeric symbols in a resultant image index. And the use of the moment invariant approach can involve the computation of an image index in high orders.
With the present invention in mind a Search of Patents was performed, with the result being that very little was found. A Patent to Windig, U.S. Pat. No. 5,841,891 is disclosed, however, as it identifes the calculation of Eigenvalues, but in a method for enhancing images. A Patent to Shimura et al., U.S. Pat. No. 5,644,756 is also identified as it describes generating calculated feature data for identifying images, with application in image identification. A Patent, U.S. Pat. No. 5,608,862 to Enokida is disclosed as it describes development of a tag which indicates the length of data in hierarchically coded image data. U.S. Pat. No. 5,572,726 to Hasuo is disclosed as it describes an index image for use in retrieval of data. A Patent to Tsujumura et al., U.S. Pat. No. 5,586,197 is disclosed as it describes using color as a basis of searching for a data image in an image database. Finally, a U.S. Pat. No. 4,742,558 to Ishibashi et al. is disclosed use of a hierarchical structure for use in image retrieval and display.
Even in view of the prior art there remains need for a convenient method of characterizing data images, and multidimensional data sets, so that they can be easily stored and retrieved. Said method should provide an index which demonstrates Uniqueness, Universality, Computational Ease, Conciseness, Invariance to data image change of scale, rotation and translation of data image object, position shifting, linear calibration of color and pixel intensity, and Resistance to random noise. In answer, the present invention provides a method of extracting a data index from an image or data set, which data index is comprised of a concatenation of Eigenvalue calculation mediated index elements determined at a plurality of hierarchical depth data levels.