1. Field of the Invention
This invention generally relates to a method and apparatus for classifying the document elements of a document by analyzing the relational geometry of the major white regions surrounding the document elements. In particular, this invention relates to a method and apparatus for logically identifying document elements in a document image using structural models.
2. Related Art
Conventional methods for logically identifying elements in a document image are outlined in Nagy et al., "A Prototype Document Image Analysis System for Technical Journals", pp. 10-21, Computer, July 1992. However, no method for logically identifying elements segmented only by major white regions has been disclosed. A method for extracting text regions by analyzing the white space in a document image has been disclosed by Baird et al., "Image Segmentation By Shape-Directed Covers", "10th International Conference of Pattern Recognition", pp. 820-825, 16-21 June 1990. However, the method disclosed by Baird does not recognize logical structure or clearly identify stopping rules for document element extraction.
Other methods have been disclosed for segmenting document elements in a document image, but none analyze the white areas in a document image, segment document elements by extracting and analyzing major white regions or logically identify such segmented document elements. White regions are those areas of a document which contain no connected components.