The present invention relates to digital image analysis and, more particularly, to a low-level digital image classification system. A major objective of the present invention is to provide for fast, effective, low-level image classification that can be implemented with reduced hardware/software requirements.
Humans engage in image classification whenever they look at an image and identify objects of interest. In images, humans readily distinguish: humans from other objects, man-made features from natural features, and text from graphics, etc. With specialized training, humans are adept at recognizing significant features in specialized images such as satellite weather images and medical tomographic images.
Suitably equipped machines can be programmed and/or trained for image classification, although machine recognition is less sophisticated than human recognition in many respects. Computerized tomography uses machine classification to highlight potential tumors in tomographic images; medical professionals examining an image for evidence of tumors take advantage of the highlighting to focus their examination. Machine classification is also used independently; for example, some color printer drivers classify image elements as either text or graphics to determine an optimal dithering strategy for simulating full-range color output using a limited color palette.
Most machine image classification techniques operate on digital images. Digital images are typically expressed in the form of a two-dimensional array of picture elements (pixels), each with one (for monochromatic images) or more (for color images) values assigned to it. Analog images to be machine classified can be scanned or otherwise digitized prior to classification.
The amount of computational effort required for classification scales dramatically with the number of pixels involved at once in the computation. The number of pixels is the product of image area and image resolution, i.e., the number of pixels per unit area. As this suggests, faster classification can be achieved using lower resolution images, and by dividing an image into small subimages that can be processed independently; the total computation involved in classifying the many subimages can be considerably less burdensome than the computation involved in classifying an image as a whole. On the other hand, if the subimages are too small to contain features required for classification, or if the resolution is too low for relevant features to be identified, classification accuracy suffers.
Successful xe2x80x9clow-levelxe2x80x9d classification techniques depend on finding suitable tradeoffs between accuracy and computational efficiency in the selection of image resolution and in subimage area. In general, subimage area can be imposed by the classification technique, whereas resolution is typically a given. In such cases, subimage area is typically selected to be the minimum required for acceptably accurate classification. The selected subimage area then determines the number of pixels per subimage, and thus the amount of computation required for classification.
When image resolution is optimal for classification, the number of pixels required per subimage can be surprising small. For example, 8xc3x978-pixel subimages are typically sufficient for distinguishing text from graphics; 4xc3x974-pixel subimages are typically sufficient to distinguish man-made from natural objects in an aerial image; and 2xc3x972-pixel subimages can be used to distinguish potential tumors from healthy tissue in a computerized tomographic image. Of course, subimages with greater numbers of pixels must be used if the image resolution is greater than optimal for classification purposes.
Low-level classification strives to assign each subimage to a class. Ideally, the assignment would be error free. When this cannot be done, the goal is to minimize the likelihood of error, or, if some errors are more costly than others, minimize the average cost of the errors. Bayes decision theory and related statistical approaches are used to achieve the goals. The computations that are required must be iterated for each block. While it is reduced relative to full-view classification, the amount of computation required for low-level classification can still be excessive.
Technological progress has provided both more powerful computers and more efficient image classification techniques. Rather than satisfy the demand for efficient image classification, these advances have fueled demand by proliferating the use of computerized images and raising expectations for real-time image processing.
Recent developments on the Internet, particularly, the World Wide Web, illustrate the demand for communication of images, particularly in high-bandwidth applications such as interactive video and video conferencing. Internet providers targeting a large audience often must transmit not only the images but also applications, e.g., browsers, for viewing and interacting with the images. The unsophisticated consumers of these images are often not tolerant of delays that might be involved in any classification activities associated with these images. Furthermore, the image providers cannot assume that their consumers will have hardware dedicated to the classification activities, nor can the providers conveniently distribute such dedicated hardware.
Thus, there is an increasing need for more efficient image classification techniques. Preferably, such techniques would achieve high performance even in software implementations that require only a fraction of the processing power available on inexpensive home and desktop computers. When embodied as software, the techniques should be readily distributed by image providers. Whether hardware or software based (or both), improved image classification techniques are desired to enhance all the applications that depend on them.
The present invention provides an image classification system comprising means for converting an image into vectors and a lookup table for converting the vectors into class indices. Each class index corresponds to a respective class of interest. Performing classification using tables obviates the need for computations, allowing higher classification rates.
The lookup table can be single-stage or multi-stage; a multi-stage lookup table permits classification to be performed hierarchically. The advantage of the multi-stage table is that the memory requirements for storing the table are vastly reduced at the expense of a small loss of classification accuracy.
Multi-stage tables typically have two to eight stages. Only the last stage table operates on blocks of the size selected to allow acceptably accurate classification. Each preceding stage operates on smaller blocks than the succeeding stage. The number of stages is thus related to the number of pixels per block.
For example, a four-stage table can be used to classify 4xc3x974 pixel blocks. For each 4xc3x974 image block, the first stage can process sixteen individual pixels in pairs to yield eight indices corresponding to eight respective 2xc3x971 pixel blocks. The second stage can convert the eight 2xc3x971 blocks indices to four 2xc3x972 block indices. The third stage can convert the four 2xc3x972 block indices to two 4xc3x972 block indices. The fourth stage can convert the two 4xc3x972 block indices to one 4xc3x974 block classification index.
In this example, each stage processes inputs in pairs. For each 4xc3x974 image vectors, the first stage processes eight pairs of pixels. This can be accomplished using eight first-stage tables, or by using one first-stage table eight times, or by some intermediate solution. In practice, using a single table eight times affords sufficient performance with minimal memory requirements. Likewise, for the intermediate stages, a single table can be used multiple times per image vector for fast and efficient classification. Note that the number of stages can be reduced by increasing the number of inputs per table; for example, using four inputs per table halves the number of stages required, but greatly increases the total memory required for the multi-stage table.
In most cases, the pixel domain in which an image is expressed is not optimum for accurate classification. For example, more accurate classification can often be achieved when the image is transformed into a spatial frequency domain. While the invention applies to vectors transformed to another domain prior to entry into a lookup table, the invention further provides for the transform to be performed by the classification table itself so that there is no computation required.
The method for designing the classification lookup tables includes a codebook design procedure and a table fill-in procedure for each stage. For each stage, the codebook design procedure involves clustering a statistically representative set of vectors so as to minimize some error metric. The vectors are preferably expressed in the domain, e.g., pixel or spatial frequency, most useful for the classification of interest. The dimensionality of the vectors is dependent on the stage and the number of inputs to that stage and preceding stages. For preliminary stages, the error metric is a proximity measure, preferably weighted to preserve information relevant to classification. For the final stage, the preferred error metric takes Bayes risk, i.e., risk of classification error, into account; the Bayes risk can be weighted to reflect differential costs of classification errors.
The statistically representative set of vectors can be obtained by selecting a set of training images that match as closely as possible the statistical profiles of the images to be classified. If the images to be classified involve only aerial photographs of terrain, the training images can be aerial photographs of terrain. If the images to be classified vary considerably in content, so should the training images.
The training images are divided into blocks, which are in turn expressed as vectors. The dimensionality of blocks and vectors is stage dependent. The first-stage input blocks are 1xc3x971, so the corresponding vectors are one-dimensional. For each stage, the inputs are concatentated according to the number of stage table inputs. For a first-stage table with two inputs, two 1xc3x971 blocks are concatenated to form a 2xc3x971 block; the corresponding vector is two-dimensional. If the classification is to be performed in a domain other than a pixel domain, the post-concatenation vectors are transformed into that domain. The vectors are then processed according to a LBG/GLA algorithm to yield codebook vectors according to the selected error metric.
The codebook vectors are assigned indices. For preliminary-stage tables, the indices are preferably fixed-length; these indices represent codebook vectors. For a last stage of a multi-stage classification table or the only table of a single-stage classification table, the indices represent classes. If there are only two classes, a single bit classification index can be used. If there are more than two classes, more bits on the average are required for the index. In this case, the index can be fixed-length or variable. A variable-length index can be used to represent classification more compactly where the distribution of image vectors to codebook vectors is nonuniform. To optimize the variable-length code, the error metric for the last-stage codebook design can be subject to an entropy constraint. In any event, the number of classes should be less than or equal to the dimensionality of the image vectors to ensure sufficiently accurate classification.
Once a codebook is designed for a stage, the table fill-in procedure can be executed. In this procedure, the set of all possible combinations of inputs to a stage table define its addresses. The purpose of this procedure is to assign same-stage codebook indices to each of these address so as to optimize classification accuracy.
In the case of a first-stage table, individual pixel inputs are concatenated to define an input vector in the pixel domain. If the classification is to be performed in a domain other than the pixel domain, this vector is transformed accordingly (so that it is in the same domain as the codebook vectors). Each address vector is mapped to the closest codebook vector. While a weighted proximity measure can be used, better results are obtained using an objective proximity measure.
In the case of second and succeeding-stage tables, the inputs are indices representing codebook vectors for the preceding stage. These must be decoded to yield previous-stage codebook vectors to which a proximity measure can be applied. If the classification is to be performed in a pixel domain, the previous-stage codebook vectors are in the pixel domain. They can be concatenated to match the dimensionality of the same-stage codebook vectors. A suitable proximity measure is used to determine the codebook closest to each concatenated address vector. The index associated with the closest codebook vector is assigned to the concatenated address vector.
The procedure for second and succeeding stages must be modified if the classification is to be performed in other than the pixel domain. In that case, the decoded indices are previous-stage codebook vectors in the other domain. An inverse transform is applied to convert these to the pixel domain to permit concatenation. The concatenated pixel-domain vector is then transformed to the other domain, in which the proximity measure is applied to determine a closest same-stage codebook vector. The index is assigned as before. When the codebook design is completed for all stages and the table fill-in procedure has been completed for all addresses of all stage tables, design of a multistage table is complete. In the case of a single-stage classification table, the codebook design procedure is similar to that for the last stage of a multi-stage table, while the table fill-in procedure is similar to that for the first stage of a multi-stage classification table.
The invention further provides that the table used for classification has other concurrent uses. For example, the tables can be used for joint classification and compression. In these cases, the output can be a pair of indices, one for classification and another for codebook vector. Alternatively, a single codebook vector index can be output and a decoder can assign the class during decompression.
When the table is dual purpose, it can be desirable to use codebook measures that are not optimized for classification. For example, measures optimized for image reconstructed may be used in place of measure optimized for classification, if fidelity of the reconstructed image is of paramount importance. Otherwise, a weighted combination of classification-optimized and compression-optimized measures can be used in codebook design. In particular, last-stage codebook design can use a weighted combination of perceptual proximity and weighted risk of misclassification.
In accordance with the foregoing, the present invention permits low-level block-based image classification to be performed without computation. As a result, classification can be performed in software at rates formerly requiring greater general computer power or dedicated image processing hardware. Since the tables can be embodied in software, they can be readily distributed, e.g., over the Internet, so that they can be used locally on images selected by a receiver. Furthermore, the invention allows classification to be performed in a domain other than a pixel domain, where the block-based transformation is designed into the classification table so that no computations are required during image processing. In addition, the invention provides for multi-use tables, such as those for joint classification and compression. These and other features and advantages of the invention are apparent from the description below with reference to the following drawings.