1. Field of the Invention
The present invention relates to electronic filing systems for managing and storing electronic documents. And particularly to an apparatus, a method, and a program for generating a reduced image of an electronic document to be managed.
2. Description of the Related Art
There are known electronic filing apparatuses (document management apparatuses) that link document management information to document image data generated by reading a document with a scanner to store the linked document management information and document image data so that these stored document images can be searched based on the linked document management information, displayed, and printed. These electronic filing apparatuses link document management information (document names, numbers of pages, registration dates, keywords, etc.) to documents to register the linked document management information and the documents. For example, when a document list and a search result list are to be displayed, these items of document management information are displayed as information for identifying documents.
However, it is difficult for a user to identify the outline of a document by taking a first glance at such document management information only. To overcome this difficulty, an electronic filing apparatus for generating and registering reduced images (thumbnails) of electronic documents and displaying the reduced images in a document list and a search result list is also proposed (see, for example, Japanese Patent Laid-Open No. 10-240724).
FIG. 10 is a block diagram depicting an example functional structure of a typical electronic filing apparatus for displaying reduced images in a document list and a search result list. Referring to FIG. 10, the electronic filing apparatus includes, for example, a document-reading section 1000, a reduced-image generating section 1001, a document-storage section 1002, and a display control section 1003. The document-reading section 1000 reads file data in the file system and document data received from a device not shown in the figure or via a network. The reduced-image generating section 1001 generates drawing data that is output when document data read by the document-reading section 1000 is to be displayed on the screen and generates image data reduced to an appropriate size from the drawing data through, for example, dot decimation. The document registration section 1002 links document data read by the document-reading section 1000 to reduced image data of the document generated by the reduced-image generating section 1001 and stores the linked data. The display control section 1003 controls the display of the document data and reduced images stored in the document-storage section 1002, and displays a document list based on the reduced images as shown in, for example, FIG. 11.
In many cases, the reduced image generated at this time is mainly related to the page output in respect of the top of the page.
On the other hand, U.S. 2002-0007367 A1 (Foreign Priority: Japanese Patent Laid-Open No. 2002-32364) describes technology for filtering elements of a document and arranging the filtered elements from the top of a page in order of importance to print (or display) the page. U.S. 2002-0007367 A1 is intended to allow users to efficiently recognize the content of a document composed of a plurality of elements when the document is to be printed or displayed.
In the above-described known technology, however, even though the outline of a document, such as the layout of the entire document, can be identified from a reduced image (thumbnail) of the document, it is difficult to recognize characters contained in the document from the reduced image. For this reason, it is very difficult to identify documents with similar layouts or documents with characterless layouts (that is to say, layouts lacking distinctive character), such as those without large characters or graphics, based on the above-described reduced image.
This difficulty becomes more noticeable especially when a reduced image has been generated from document data of text containing characters of basically uniform size, such as an XML document. The XML document data shown in FIG. 12 is a typical example. Reference numeral 1200 denotes the original document data to be input and stored in an electronic filing system, and reference numeral 1201 denotes a reduced image generated from the document data 1200. It is very difficult for a user of the known electronic filing system to learn the content of the document 1200 from this reduced image 1201.
According to the above-described U.S. 2002-0007367 A1, the structure of a document is analyzed to arrange elements in order of importance. In this case, elements to be displayed are selected based on their importance. However, since U.S. 2002-0007367 A1 does not take into account the generation of reduced images (thumbnails), the same problem arises when a document (e.g., a document which does not include large characters or graphics) is represented in a reduced image. Furthermore, since elements are arranged in order of importance, the original layout of the document is completely ignored. Therefore, even if there are some documents described in the same format, the documents cannot be compared based on the layout. This is problematic in that it is difficult to visually determine whether one document is similar to another.