The present invention relates to techniques for analyzing data defining an image.
Takeda et al., U.S. Pat. No. 5,228,100, describe techniques for producing from a document image a form display with blank fields and a program to input data to the blank fields. As shown and described in relation to FIG. 1, a document processing apparatus includes a processor, an image input device for storing image data in memory, a printer for achieving a print operation of data from memory, and memory for programs and for data items. As shown and described in relation to FIG. 2, a document format recognition step recognizes an image of a document format to determine a format information item, a document construction step generates document content data associated with the document format, the system creates output data for the document data based on the resultant format and content data, and a document output step prints the output document data on a print form or stores it in a data file. FIGS. 8-70, 87, and 88 relate to a document form or format recognition step. As shown and described in relation to FIGS. 8, 9-a, and 9-b, a physical structure recognition step recognizes a physical structure as a format of a document, with physical structure designating only graphic structure such as the layout of line segments, letters, arcs, or the like, but not explicitly designating any meaning of a document. An area is subdivided into a plurality of blocks, and the system judges whether or not a selected block has a type representing a table. The judgement may be accomplished such that, for example, when the block has a horizontal width and a vertical height respectively exceeding predetermined threshold values, the block is determined to belong to a table. If the block type is determined to be other than a table, elements of the construction are recognized. Otherwise, the block is subdivided into subblocks or subregions and a subblock is selected and recognized.
Bloomberg, U.S. Pat. No. 5,202,933, describes techniques for segmenting text and graphics. Col. 1 lines 38-40 indicates that it is important to send only graphics regions to graphics recognizers. As shown and described in relation to FIG. 1B, one technique eliminates vertical rules and lines; then eliminates horizontal rules and lines; then solidifies remaining text regions to produce a separation mask that can be used to separate text and graphics images. As shown and described in relation to FIGS. 12A-12D, an image contains text and line graphics, and the line graphics contain minor amounts of text; in the separated text image, all of the line graphics and its associated text have been removed, and all of the text blocks remain; in the separated graphics image, all of the text blocks have been removed, and the line graphics and its associated labels remain.