The present invention relates generally to image segmentation, and more particularly, to a method and system for image segmentation through multiple reductions of the size of an image.
In general, segmentation is the first step in the process of image recognition. Segmentation may be defined as the identification and separation of clusters of mutually close objects, that is, objects that are closer to each other than to any external object. The goal of segmentation is to extract target objects from the separated clusters that are characterized by such parameters as size, shape, granularity, texture, intensity of color, and location.
An aerial photograph, for example, may be segmented by identifying various target objects, i.e. landmarks, with different shapes and textures, such as fields, roads, buildings, bodies of water, and the like. Thereafter, the segmented objects may be extracted and compared with a database of such objects in order to identify the geographical location of the scene in the photograph.
Similarly, the process of segmentation is generally the first step in optical character recognition (OCR), in which a document is electronically scanned and converted into a form that can be easily manipulated by, for example, a word processor. Many documents, however, are complex, including two or more columns of text, as well as photographs, diagrams, charts, and other objects. Therefore, such documents are initially segmented in order to extract blocks of text for analysis.
In the OCR context, segmentation is often referred to as xe2x80x9cline extractionxe2x80x9d because it typically involves segmenting the document into a plurality of lines. Generally, lines are the basic unit of extraction because they indicate the flow of the text. In a multi-column document, for example, it is obvious why a knowledge of the line layout is essential to correctly interpreting the meaning of the text. Moreover, in recognizing a word or character, a knowledge the surrounding words and characters in a line permits the use of contextual and geometric analysis in resolving ambiguities.
Conventionally, segmentation is performed using a xe2x80x9cbottom upxe2x80x9d or xe2x80x9cconnected componentxe2x80x9d approach. This method involves decomposing the image into basic entities (connected components) and aggregating those entities according to some rule. For example, in a page of text, a single character is generally the most basic connected component. During segmentation, a character is identified and assigned a minimum bounding rectangle (MBR), which is defined as the smallest rectangle that completely contains a discrete pattern of a connected component. Thereafter, all of the MBRs within a certain distance from each other are aggregated. If the correct distance is chosen, the aggregated MBRs will form horizontal connected components representing lines of text, which may then be extracted for analysis.
Segmentation is performed automatically and almost instantly by the human brain. For example, when a person looks at a document, he or she can easily identify the text portions among a variety of other objects. However, as currently implemented, conventional methods and systems for image segmentation are slow and inefficient. This is particularly true with respect to segmenting complex documents including, for example, more than one column of text, halftone regions, graphics, and handwritten annotations.
Conventional approaches are time consuming because they must decompose the sample image, identify each of the individual connected components, calculate the distances between the components, and aggregate those components within a certain distance from each other. For complex documents, this process can result in a large number of calculations, and accounts for a significant portion of the overall processing time in image recognition. What is needed, then, is a segmentation method and system that is significantly faster than conventional approaches.
The present invention offers a more efficient, holistic approach to image segmentation. Briefly, the present invention recognizes the fact that components of a document, when viewed from a distance, tend to solidify and aggregate. For instance, if a person stands at a distance from a printed page, the lines of text appear to blur and, for practical purposes, become solid lines. This effect can be simulated on a computer by reducing the size or resolution of a scanned image. For example, as shown in FIG. 1, several characters on a line become a single connected component at a reduction of 1:4.
By exploiting this effect, a more efficient and substantially faster method for image segmentation is realized. According to the present invention, a size reduction unit (134) reduces the size of a sample image (144), and, at the same time, fills small gaps between foreground pixels. As noted above, size reduction tends to solidify clusters of connected components separated by narrow gaps. Thereafter, a connected component analyzer (136) identifies connected components and their associated minimum bounding rectangles in the reduced image (145). Next, a target object filter (138) searches the connected components for target objects, making use of a target object library (146) to identify target objects characterized by such parameters as size, shape, and texture. Finally, an inverse mapper (140) locates the bounding rectangles of the target objects in the original sample image (144), and extracts the associated portions of the image (144) for analysis in a conventional image classifier (142).