1. Field of the Invention
The present invention relates to a method for automatically locating tables in documents, the method comprising the steps of defining a plurality of tiles for a document; determining a horizontal profile and a vertical profile for each tile; detecting lines by means of the profiles; determining at least one rectangle from the lines and accepting from the at least one rectangle a rectangle as a table of a document.
The present invention also relates to an apparatus for automatically locating a table in a document by application of the method.
2. Background of the invention:
Such tables may be, for example, the whole or parts of tables present on forms (a “table”, here, being a two-dimensional assembly of cells). However, the present invention is of particular interest with regard to the automatic locating of title blocks (or “legends”) in technical drawings. In the present document, the term “table” is used to designate all of the aforementioned examples, and table-like structures in general.
The techniques of the present invention will usually, but not exclusively, be applied to representations of documents in the form of bitmaps, obtained by scanning analogue documents or creating digital documents.
There are many fields in which it would be advantageous to be able to automatically detect the location of tables present in documents. One example is form processing: it is desirable to be able to automatically locate cells in tables on the forms, so that database entries can be generated automatically by reading data in the cells. Various proposals have already been made in this field. However, the majority of form-recognition computer programs currently available rely on the recognition of forms in constrained poses within images consisting of the form itself and little else.
Automatic table location is of particular interest in the field of processing technical drawings (such as mechanical engineering drawings or architectural drawings). In the latter field, if the title block of the drawing can be located automatically then the following processes can be realized:                automatic folding of the drawing such that the title block remains visible;        positioning of the drawing in the correct orientation (since the title block generally is located in a specified corner with respect to the image, for example, bottom right);        in the processing of scanned images of drawings, for example using a personal computer, the title block can be displayed at an enlarged scale, to assist in manual indexing; and        a first step is taken in the automatic indexing of drawings: once the title block has been located it is then simply a question of extracting the information contained therein.        
Various standards have been defined at national and international levels, governing the content and positioning of title blocks (legends) in technical drawings. These standards include ISO 5457 and ISO 7200 and French national standard NF E 04-503. More specifically the standards are explained in European patent EP 1237115.
A method of the type mentioned above is disclosed in European patent EP 1237115, wherein tables are located by analysis of sub-regions thereof, which are analogous to the tiles in the present invention. A tile is defined as a sub-region of the image, which consists of a number of pixels that form a solid rectangle, solid triangle, solid hexagon or any other solid polygon. The number of pixels of the solid polygon is defined as the tile size. The analysis in EP 1237115 involves determining the location of lines by creating a horizontal and vertical profile based on a sum of black pixels on each row of pixels in a tile, preferably a solid rectangle. Lines having lengths above a certain threshold are determined and assigned to groups. A set of adjacent lines is assigned to a common group, if the separation between adjacent pairs of lines within the set is less than a threshold value. A rectangle is selected from all investigated rectangles as the location of a table, if it contains a group with the greatest number of lines among all investigated rectangles. When the technique is applied to locating title-blocks in technical drawings, rectangles are analyzed that correspond to the corners or ends of the document.
A disadvantage of the method according to EP 1237115 is that only corners or ends of documents are analyzed. In the case of, for example, large margins of the documents, the title-blocks may not be discovered. Moreover, the determination of title blocks by means of the horizontal profiles and vertical profiles is based on the sum of black pixels on each determined row of pixels in the image. Such a determination is not applicable on a continuous tone image. Another disadvantage of the method according to EP 1237115 is that slant effects of title-blocks are not covered.