The exemplary embodiment relates to image processing. It finds particular application in connection with an apparatus and method for generation of a representation of an image based on runlength histograms. Implementations of the apparatus and method include retrieval, categorization, and clustering applications, but it is to be appreciated that they are not limited to such applications.
In data processing, some useful operations include automatic or semi-automatic image categorization, automated or semi-automated retrieval of similar images, and clustering of images. For example, given an unorganized database of scanned images of documents, it may be useful to sort or categorize the images into classes such as the type of document. There are many applications where documents are currently manually sorted according to type, such as tax forms, medical records, or the like and which could benefit from an automated determination as to whether the document is a particular type of form or particular page of a form without the need for optical character recognition. Subsequent processing of the document or document pages could then be based on the determination. In a related application, given a particular document image, it may be useful to identify and retrieve similar images from the database of images.
To enable such techniques to be performed automatically or semi-automatically, some mechanism for automated image characterization based on the content of the image is desirable. Since a digital image is essentially in the form of pixel values, e.g., colorant values, for each of typically millions of pixels, image characterization techniques typically rely on extracting features from the image based on small segments of the image, referred to as patches. Techniques have been developed for categorizing images which rely on training a classifier, or set of classifiers, with information extracted from a large number of training images. The training images are manually labeled with one or more of a set of predefined object categories, such as person, landscape, animal, building, and the like. The classifier learns how to characterize a new image based on its extracted features and the extracted features of the labeled images. Such techniques, however, are manually intensive in the training phase, often requiring the manual labeling of a large number of images for each class for which the classifier is to be trained. Additionally, they do not readily adapt to the case of scanned documents where the documents may be scanned upside down.
There remains a need for an automated method for generating a representation of an image, such as a scanned document, which is readily implemented.
Incorporation By Reference
The following references, the disclosures of which are incorporated herein in their entireties by reference, are mentioned.
U.S. Pub. No. 2007/0005356, entitled GENERIC VISUAL CATEGORIZATION METHOD AND SYSTEM, U.S. Pub. No. 2007/0258648, entitled GENERIC VISUAL CLASSIFICATION WITH GRADIENT COMPONENTS-BASED DIMENSIONALITY ENHANCEMENT, and U.S. Pub. No. 2008/0069456 entitled BAGS OF VISUAL CONTEXT-DEPENDENT WORDS FOR GENERIC VISUAL CATEGORIZATION, all by Florent Perronnin, and G. Csurka, C. Dance, L. Fan, J. Willamowski and C. Bray, “Visual Categorization with Bags of Keypoints”, ECCV workshop on Statistical Learning in Computer Vision, 2004, disclose systems and methods for categorizing images based on content.
U.S. Pat. No. 7,124,149, issued Oct. 17, 2006, entitled METHOD AND APPARATUS FOR CONTENT REPRESENTATION AND RETRIEVAL IN CONCEPT MODEL SPACE, by Smith, et al., discloses a method and apparatus for extracting a model vector representation from multimedia documents. A model vector provides a multidimensional representation of the confidence with which multimedia documents belong to a set of categories or with which a set of semantic concepts relate to the documents. The model vector can be associated with multimedia documents to provide an index of its content or categorization and can be used for comparing, searching, classifying, or clustering multimedia documents.
U.S. Pat. No. 4,949,392, issued Aug. 14, 1990, entitled DOCUMENT RECOGNITION AND AUTOMATIC INDEXING FOR OPTICAL CHARACTER RECOGNITION, by Barski, et al., discloses a method in which a library of templates defining the spacings between pre-printed lines and the corresponding line lengths for a plurality of different business forms is compared with the image data of an unknown document to determine the known business form (template) to which the document corresponds.
U.S. Pat. No. 5,335,290, issued Aug. 2, 1994, entitled SEGMENTATION OF TEXT, PICTURE AND LINES OF A DOCUMENT IMAGE, by Cullen, et al, discloses a method and apparatus for segmenting a document image into areas containing text and non-text. From a bit-mapped representation of the document image, run lengths are extracted for each scanline. Rectangles are constructed from the run lengths. The method includes classifying each of the rectangles as either text or non-text; correcting for the skew in the rectangles, merging associated text into one or more text blocks, and logically ordering the text blocks.
U.S. Pat. No. 5,822,454, issued Oct. 13, 1998, entitled SYSTEM AND METHOD FOR AUTOMATIC PAGE REGISTRATION AND AUTOMATIC ZONE DETECTION DURING FORMS PROCESSING, by Rangarajan, discloses a system which automatically detects user defined zones in a document image of a form, compensating for skew and displacement of the image with respect to an original image of a form. The system further processes the image to remove horizontal and vertical lines, and to create a number of blocks, representing either text or image data. The lines are removed and the blocks formed by runlength smoothing with various parameters. The blocks are labeled such that any set of connected blocks share a unique identification value. Additional data is collected on the commonly labeled blocks to select those blocks useful to definition of a template. The template is a collection of vectors between the centroids of each of the selected blocks. A second document image for processing is obtained, and similarly processed to minimize, deskew, and identify blocks and vectors therein. The vectors in the second document image are compared with vectors in a user selected template to determine the location of user defined zones in the second document image.
U.S. Pat. No. 5,832,118, issued Nov. 3, 1998, entitled TEXTURE CLASSIFICATION APPARATUS EMPLOYING COARSENESS AND DIRECTIVITY OF PATTERNS, by Kim, discloses an apparatus for classifying a textured image based on pattern coarseness and directivity. The apparatus includes a quantizer for obtaining a quantized image from the textured image, the quantized image containing a plurality of pixels each of which is represented by one of N quantized values, N being a positive integer, a scanning block for scanning the quantized image along M scanning directions, M being a positive integer, to thereby provide M scanned images, a grey level mean runlengths configuration block for providing a set of runlengths by counting runlengths of pixels having a same quantized value, for each of the M scanned images and each of the N quantized values, to thereby provide M×N sets of runlengths, providing M×N mean runlengths by averaging each set of runlengths, and forming an M×N matrix whose elements are the mean runlengths. A pattern coarseness decision block determining a coarseness of the textured image by using the matrix; pattern directivity decision block for determining a directivity of the textured image by using the matrix; and texture classification block for classifying the textured image according to the coarseness and the directivity.
U.S. Pat. No. 6,141,464, issued Oct. 31, 2000, entitled ROBUST METHOD FOR FINDING REGISTRATION MARKER POSITIONS, by Handley, discloses a method and system for detecting registration markers having the shape of an ‘X’ on a scanned test target used to calibrate an image processing system. A captured image is converted to a binary image and then segmented into a list of eight-connected components. A set of features is extracted from each component in the list. An accurate decision rule is executed on each component. If a component is found to be a marker, its centroid is returned as its position.