1. Field of the Invention
The present invention relates to computer-aided analysis of printed media material.
2. Related Art
Computers are increasingly being used to perform or aid in the analysis of documents and printed material. Such analysis includes the identification of the location and relative arrangement of text and images within a document. Such document layout analysis can be important in many document imaging applications. For example, document layout analysis can be used as part of layout-based document retrieval, text extraction using optical character recognition, and other methods of electronic document image conversion. However, such analysis and conversion generally works best on a simple document, such as a business letter or single column report, and can be difficult or unworkable when a layout becomes complex or variable.
Complex printed media material, such as a newspaper, often involve columns of body text, headlines, graphic images, multiple font sizes, comprising multiple articles and logical elements in close proximity to each other, on a single page. Attempts to utilize optical character recognition in such situations are typically inadequate resulting in a wide range of multiple errors, including, for example, the inability to properly associate text from multiple columns as being from the same article, mis-associating text areas without an associated headline or those articles which cross page boundaries, and classifying large headline fonts as a graphic image.
What are needed, therefore, are systems and/or methods to alleviate the aforementioned deficiencies. Particularly, what is needed is an effective and efficient approach to recognize and analyze printed media material which is presented in a complex columnar format in order to segment the printed media material into articles.