1. Field of Invention
The present invention relates to the automatic locating of table-like structures present in documents or the like.
2. Description of the Related Art
There are many instances in which it would be advantageous to be able to detect automatically the location of tables present in documents or the like. One example is in form processing: it is desirable to be able to locate automatically cells in tables on the forms, so that database entries can be generated automatically by reading data in the cells.
Various proposals have been made in this field. However, the majority of form-recognition computer programs currently available in the market rely on the recognition of forms in constrained poses within images consisting of mainly the form itself.
The automatic table location technique is of particular interest in the field of processing technical drawings (such as mechanical engineering drawings or architectural drawings). In the latter field, if the title block of the drawing can be located automatically, then the following processes can be realized:                the drawing can be folded automatically such that the title block remains visible,        the drawing can be positioned in the correct orientation, since the title block generally is located in a specified corner with respect to the image, for example, bottom right,        in the processing of scanned images of drawings, for example using a personal computer, the title block can be displayed at an enlarged scale to assist in manual indexing, and        a first step in the automatic indexing of drawings can be taken since once the title block has been located, it is then simply a question of extracting the information contained therein.        
Various standards have been defined at national and international level, governing the content and positioning of title blocks (legends) in technical drawings. These standards include ISO 5457 and ISO 7200 and French national standard NF E 04-503.
According to the standard ISO 7200, the legend is a table-like form composed of various rectangular cells or “fields” located within the page. The cells contain information and three fields are compulsory wherein the fields include:                (1) an identification zone giving an identification number or code to the drawing,        (2) a title zone, and        (3) a zone containing the name of the drawing's owner.        
According to the standard ISO 5457, the identification portion of the title block needs to be at the right-hand bottom corner of the title block when it is seen in its normal direction of viewing, and need to have a maximum length of 170 mm. According to the French standard NF E 04-503, the dimensions of the title block should not exceed 190 mm in width and 277 mm in height.
The standard ISO 5457 also specifies that “the position of the title block should be within the drawing space such that the portion of the title block containing the identification of the drawing (registration number, title, origin, etc.) is situated in the bottom right-hand corner of the drawing space, both for sheets positioned horizontally, type X (see FIG. 1a)), or vertically, type Y (see FIG. 1b)). The direction of viewing of the title block should correspond in general to that of the drawing. Nevertheless, in order to economise on preprinted drawing sheets, it is permitted to use sheets type X in the vertical position (see FIG. 1(c)) and sheets type Y in the horizontal position (see FIG. 1(d)). In these cases, the identification portion of the title block should be in the right-hand top corner of the drawing space, and orientated such that the title block may be read when viewed from the right.”
It will be seen that the standards allow some latitude in the positioning of title blocks in technical drawings. Moreover, the standards are constantly evolving and technical drawings do not always comply with the rules defined in these standards. There is particular variability where old drawings are concerned. Thus, conventional techniques for locating a table or cell used by form recognition software are not suitable for locating a title block in drawings.
Usually, technical drawings have borders, a filling margin for taking perforations, a frame for limiting the drawing space, and centering and orientation marks to indicate positioning and orientation. However, no reliance can be placed on these features since they are not always present. Moreover, the title block locating process generally is performed based on scanned images of technical drawings and the above-mentioned features may be absent from the scanned image due to bad positioning of the drawing during the scanning. Thus, the title block locating process should be performed based upon other factors.
A paper “Automated Table Processing: An (Opinionate(d) Survey” by D. Lopresti and G. Nagy, from Proceedings of GREC'99, pp. 109-134, shows that, where cell location in tables is concerned, in general, it is necessary to extract the table structure from an image of the document by discerning the lines defining the boundaries of the cells. Such methods are not directly applicable to locating title blocks in technical drawings.
“An efficient algorithm for form structure extraction using strip projection” by J-L Chen and H. J. Lee, appearing in “Pattern recognition”, vol. 3, no.9, pp. 1353-1368 (1998) proposes a method for extracting the structure of a table from an image. However, this technique is not adapted for locating specific structures, such as legends on technical drawings.
“Extracting Indexing Keywords from Image Structures in Engineering Drawings” by T. Syeda-Mahmood, from the Proceedings of ICDAR'99, pp. 471-474 (1999) specifically deals with the problem of title block locating process in technical drawings and subsequent extraction of information from the title block. A “location hashing” method is employed to find specific two-dimensional structures. However, this technique is complex and has the disadvantage of requiring a learning phase to establish a model for each structure that is to be located. Thus, this method is not suitable given the variability inherent in technical drawings.
In the present inventor's earlier French patent application number 00 03639 filed on Mar. 22, 2000, the problem of locating title blocks is solved based on a new method for table-like form processing. However, once again this method has a limitation of requiring a model for each different type of title blocks to be located.
A title block locating method is proposed in “A Practical Application of Graphics Recognition: Helping with the Extraction of Information from Telephonic Company Drawings” by J-F. Arias, A. Chhabra and V. Misra, in Proceedings of GREC'97, pp. 273-279 (1997). This method is based on the FAST method described in “Detection of Horizontal Lines in Noisy Run Length Encoded Images: The FAST method” by A. Chhabra, V. Misra and J-F. Arias, in “Graphics Recognition—Methods and Applications” ed.R. Kasturi and K. Tombre, Lecture Notes in Computer science, vol. 1072, pp. 35-48, Springer-Verlag, Berlin, Germany, 1996, which allows the extraction of straight lines from a crop of a drawing. After these straight lines have been extracted, the cells that are not empty are detected. The title block is located by detecting the cell with the largest area that also meets certain width-to-height ratio conditions. This overall technique is specific to documents in which the cells have particular dimensions, and thus is not useful in cases where there is likely a variation in cell size, such as in title blocks in technical drawings.