Image segmentation is a process of dividing or separating an image into semantically or visually coherent regions. Each region is a group of connected pixels having a similar attribute or attributes. A basic attribute for segmentation is the luminance amplitude for a monochrome image and the colour components for a colour image.
The proliferation of scanning technology combined with ever increasing computational processing power has lead to many advances in the area of document analysis systems. These systems may be used to extract semantic information from a scanned document, often by means of OCR technology. Such systems can also be used to improve compression of a document image by selectively using an appropriate compression method depending on the content of each part of the page. Improved document compression lends itself to applications such as archiving and electronic distribution.
Segmentation is a processing stage for document image analysis where low-level pixels must first be segmented into primitive objects before higher-level processes, such as region classification and layout analysis, can be performed. Layout analysis classifies primitive objects into known object types according to some predefined rules about document layout. Typically, the layout analysis does not analyse the original scanned image data, but instead works with an alternative data set, such as blobs or connected components from a segmentation of the page. The layout analysis may use object grouping in addition to individual object properties to determine their classification.
A number of existing methods for image segmentation are described hereinafter.
Thresholding is the simplest method for segmentation and can be fast and effective if an image to be processed is bi-level (e.g., a black and white document image). However, if the image is complex with regions of multiple luminance or colour levels, some of these regions may be lost during binarisation. More sophisticated thresholding techniques employ adaptive or multilevel thresholding, where threshold estimation and binarisation are performed at a local level. However, these methods still may fail to segment objects correctly.
Clustering-based methods, such as k-means and vector quantisation, tend to produce good segmentation outcomes, but are iterative algorithms that require multiple passes. Thus, such method can be slow and difficult to implement.
Split-and-merge image segmentation techniques are based on a quadtree data representation, in which a square image segment is split into four quadrants if the original image segment is non-uniform in attribute. If four neighbouring squares are found to be uniform, those squares are merged by a single square composed of the four adjacent squares. The split and merge process usually starts at the full image level. Thus, processing can only begin after the whole page has been buffered, requiring high memory bandwidth. Furthermore, this approach tends to be computationally intensive.
Region-growing is a well-known method for image segmentation and is one of the conceptually simplest approaches. Neighbouring pixels having a similar attribute or attributes are grouped together to form a segment region. However, in practice, reasonably complex constraints must be placed on the growth pattern to achieve acceptable results. Existing region-growing methods can have several undesirable effects in that the methods tend to bias towards initial seed locations. Different choices of seeds may give different segmentation results, and problems can occur if the seed point lies on an edge.
The proliferation of scanning technology combined with ever increasing computational processing power has lead to many advances in the area of document analysis systems. These systems may be used to extract semantic information from a scanned document, often by means of OCR technology. This technology is used in a growing number of applications, such as automated form reading, and can also be used to improve compression of a document by selectively using an appropriate compression method depending on the content of each part of the page. Improved document compression lends itself to applications such as archiving and electronic distribution.
Some document analysis systems perform a layout analysis to break the document into regions classified according to their content. Typically, the layout analysis does not analyse the original scanned image data, but works with an alternative data set, such as blobs or connected components from a segmentation of the page. The layout analysis may use object grouping in addition to individual object properties to determine their classification.
In general, a binary segmentation of the page is performed to generate data for the layout analysis, and this may be obtained by simply thresholding the original image. One advantage of this binary segmentation is that the segmented objects sit within a simple containment hierarchy that aids the layout analysis. Unfortunately, the layout of many complex colour documents simply cannot be represented completely by a binary image. The reduction in information content inherent in the colour to binary image conversion may result in degradation of important features and even loss of the detailed structure of the document.
A colour segmentation of the page document analysis therefore has advantages in terms of preserving the content of the page, but brings with it additional complexity. Firstly, the segmentation analysis itself becomes more involved and the processing requirements increase. Secondly, the analysis of the segmented page objects is complicated by the fact that the objects do not form a containment hierarchy. This limits the accuracy and efficiency of the layout analysis.
Document layout analysis systems may also employ techniques for verifying the text classification of a region of a document. Some of these methods use histogram analysis of pixel sums, shadowing and projected profiles. These methods are often unreliable as robust statistics are difficult to apply to such method and difficult to tune for text that might be either a single line or many lines and for which the character set and alignment of text in the document is unknown.