Document recognition typically involves a stage at which the structure of the document is analyzed. At this stage, various areas are identified in the document image, their sizes and positions are saved to memory, and their classes are detected based on their content, for e.g. text, picture, table, chart, or noise.
Thus, picture areas are detected in the document image by that part of the OCR software which is responsible for analysis and is therefore termed the analyzer.
Various solutions are available for distinguishing between text and non-text areas, but not for distinguishing between noise and picture areas. When dealing with magazine pages with complicated layouts, finding and identifying picture objects is of great practical importance. Magazine articles will often have text printed over photographs, which in turn, may have non-rectangular borders of various shapes.