This invention relates to a method and apparatus for image classification. More particularly, the present invention provides a technique to classify an image as a picture, a graphic or a mixed mode image. The classification is based on data accumulated as a result of an approximation of a segmentation process. In this regard, the approximation is HVQ-LUT-based and outputs classification maps indicating whether pixels are background, text or pictures. Said classification maps are optionally filtered to eliminate odd isolated samples and the resulting count of picture, text and background pixel is analyzed before concluding whether the image has pictorial, graphical or mixed contents.
While the invention is particularly directed to the art of image segmentation and classification, and will be thus described with specific reference thereto, it will be appreciated that the invention may have usefulness in other fields and applications. For example, the invention may be used in any environment where it would be useful to clarify an image as a particular type.
By way of background, one technique utilized most typically in image processing operations is known as vector quantization (VQ). In VQ, a block of X×Y pixels is mapped to a single “codeword” which is defined using a smaller number of bits than the number required by the original block. Codewords are stored in transmitting, receiving and storage devices, and each codeword is associated with a pre-defined set of image data. The codeword to which each pixel block is mapped, is that codeword which is associated with image data that most closely matches the image data in the pixel block. The typical process includes mapping the pixel block to a codeword, storing the codeword or transmitting it to a receiving device, and then mapping the codeword back to image data when it is retrieved from storage or received at the receiving device. Since codebook storage and codeword transmission require less space and time than storage and transmission of original image data, this process greatly reduces the resources required to reproduce the original image data.
There are typically many more combinations of pixel blocks than there are available codewords, and as indicated by the term “quantization”, several input blocks will be mapped to the same single given codeword. For a fixed number of codewords, increasing the size of the pixel block reduces the quality of mapping and reconstruction since a lot more of the actual image data must be mapped to the same number of codewords. Some drawbacks of VQ are that codebook design is often very complex, and that large amounts of time are usually required to search through the codebook and to match blocks to the appropriate codeword. While codebook design can be performed off-line, the block matching searches must be performed on-line.
In hierarchical vector quantization (HVQ), block matching searches are performed two samples at a time. Thus, look up tables (LUTs) can be used directly to perform HVQ in two or more stages. HVQ is described in U.S. Pat. No. 5,602,589, U.S. Pat. No. 6,470,052 B1 and Vishwanath and Chou, “An Efficient Algorithm for Hierarchical Compression of Video,” Proc. ICIP-94, pages 275–279 (Nov. 13–16, 1994, all of which are incorporated herein by reference. Briefly, referring to FIG. 6, in the first stage of HVQ, two image pixels are mapped to one codeword, reducing the number of samples by a factor of 2. As shown, the number of samples is reduced form eight (8) to four (4) in stage 1. In the next stage (stage 2), the process is repeated to map pairs of codewords into single codewords. Preferably, codewords are grouped in a direction perpendicular to the direction used for the previous level. As the process continues (e.g. on to stage 3), the resulting codewords are mapped to larger and larger amounts of data.
HVQ allows for a rough approximation of the content of each image block by using simple look-up table operations. The final codeword represents a block approximation and can therefore be directly mapped to other quantities which describe certain characteristics of the approximated block, such as block activity. HVQ codebook design methods follow standard VQ codebook design algorithms and are usually performed by designing the codebooks for a single stage at a time.
In Anuradha Aiyer and Robert M. Gray (“Aiyer and Gray”), A Fast, Table-lookup for Classifying Document Images, Proc. of ICIP, Kobe, Japan, 25 PP4.10 (1999), which is incorporated herein by reference, a method to use HVQ to mimic the segmentation of an image is proposed. In this method, the HVQ output is correlated with the output of a segmentation process to assist in classifying portions of an image as picture, background, or text.
Further, in Hui Cheng and Charles A. Bouman (“Cheng and Bouman”), Trainable Context Model for Multiscale Segmentation, Proc. of IC IP, Chicago, Ill., pp. 610–614 (1998), which is incorporated herein by reference, a segmentation method to classify image regions as text, background or picture regions is disclosed. Classification is performed through Bayesian training allied with some multiresolution analysis. It performs segmentation and classifies image pixels as BACKGROUND (B), TEXT (T) or PICTURE (P).
Briefly, the Cheng and Bouman segmenter (“CBS”) works by using binary classification trees to model the transition probabilities between segmentations at adjacent scales. The classification trees can be efficiently trained to model essential aspects of contextual behavior. In addition, the data model in the approach can incorporate the correlation among the wavelet feature vectors across scales.
The present invention contemplates a new and improved technique for image classification that allows for an efficient classification of an input image.