1. Field of the Invention
The present invention relates to technology which extracts character information and graphics information from images, and organizing the extracted information.
2. Description of the Related Art
Due to the large amount of space required to store paper-based documents, technology which reads documents using scanners or other reading devices, digitize the read documents, and file them in computer devices have received increasing attention in recent years.
When digitizing and filing documents using such technologies, storing the read documents as images causes the character strings in the documents to be stored as images as well, preventing keyword searches and making it necessary to carry out a complex procedure of individually checking each image representing the documents when searching for a desired document.
For this reason, when digitizing and filing documents, it is desirable to accurately identify areas displaying graphics and areas displaying character strings, and convert the character strings inside images representing character strings into text, to facilitate document searches.
An example of a technology which accurately identifies an area of character strings and an area of graphics in a document is disclosed in JP H1-266689A, and application of this technology makes it possible to accurately recognize character strings in documents for conversion to text.
Converting to text and filing the character strings in the character string areas inside a document makes keyword searches possible and facilitates reuse of digitized documents. However, it is sometimes desirable to reuse digitized documents to search not only the writing but also the graphs and photographs, etc., contained in a document. With the above-described approach of converting writing to text and filing it, it is impossible to search for graphs and photographs, etc. inside documents, because the desired information is searched through matches with character strings, and it is therefore impossible to search for the desired information.
The present invention has been made in view of the above circumstances, and provides a technology for facilitating searches of graphics areas of digitized documents.