1. Field of the Invention
Embodiments of the present invention relate to computer-aided analysis of media material.
2. Background Art
Computers can be used to perform or aid analysis of documents and printed material. For instance, techniques and systems have been developed to identify blocks of media material, such as columns in a newspaper, from text and images in a document. The identification of other portions of media material can be important in many document imaging applications. For example, the identification of lines, gutters and other intervening items may aid in the analysis of media material.
Media material may be analyzed for electronic storage and user searches. Media material to be analyzed often exists in an archived format. For example, archived newspapers are typically analyzed from a sequence of media images scanned from microfilm, where newspaper edition boundaries are not marked. However, it is necessary to associate edition dates with indexed articles. Optical character recognition (OCR) based attempts to recover the date from the images are often inadequate and frequently fail to extract the correct date.
Other problems exist to make it difficult to match and organize media images. Media images may not be scaled or rotated correctly as a result of a defect in the scanning process. In addition, portions of images may be missing, smeared, or smudged. In such cases, pixel matching may not work because of slight variations or an imperfect orientation of the media page on the image. Character recognition products often classify poorly scanned text as images. Furthermore, media mastheads may change in size, may be unidentifiable due to scanning, or may be overlooked entirely.