1. Field of the invention
The present invention relates to a method, program, and device for analyzing a document structure, and more specifically to a method, a program, and a device for analyzing a structure of a presentation document.
2. History of Related Art
In recent years, it has been common to use a presentation tool in a personal computer (PC) environment to create documents for use in making a presentation (presentation files).
A presentation file typically includes information such as text, graphics, images, and sounds, and it is created and stored by a presentation tool in various file formats. To expand the range of utilization of many presentation files created in this manner, methods are emerging that convert a presentation file into a form convenient for voice access (read-out) or for searching a file database.
For example, a tool for converting a file of Microsoft PowerPoint (®) (simply referred to as PowerPoint hereafter) into an HTML and a tool for extracting text (e.g., http://www.rdpslides.com/pptools/ppt2html/index.html and http://cita.disability.uiuc.edu/software/office/), and a technique for efficiently searching for presentation files (e.g., Published Unexamined Patent Application No. 2004-265097) are known.
Adding meta-information to document information allows an improved accuracy of document searches and of text mining techniques, which leads to efficient management of a large amount of information. For a text file, meta-information can be simply added to the file such as by extracting keywords. However, for a file like a presentation file in which a user may arbitrarily place objects such as text, graphics, and images, the document information as well as the position of the objects on a page has important meanings. Therefore, extracting the position information is essential.
Consider the slide shown in FIG. 1 as an example. If a structured text output such as one indicated below is obtained based on the position of the objects, dependency relationships among the texts can be understood. A “dependency relationship” as used herein refers to: a parent-child relationship (inclusion relationship) detected based on an overlap between objects; a sibling relationship (also referred to as a parallel relationship or sibling) detected based on the position of objects relative to each other; or a link origin and link target relationship between objects represented by graphics such as an arrow.
Exemplary Structured Text Output                Main image: Car                    ∘ FIG. 1                            □ Space Shuttle                                    ∘ FIG. 2                            □ Airplane                                    ∘ FIG. 3                            Motorcycle                                                
However, no techniques are known that extract such positional information and generate structured data that is readily applicable to voice access or text mining. Text mining as used herein refers to a technique of analyzing and mining a large amount of data to derive useful information. In view of the above issues, the present invention aims to solve the following problems (1) to (4) in conventional art.
(1) Problem with the Read-out Order
Since presentation files are created by different users in different formats, it is difficult to understand the content of the files with voice access (automatic read-out). For example, screen readers (software for providing voice output of GUI screens) can only read out objects on a slide one-dimensionally in order of depth of the objects (this direction is referred to as the Z coordinate herein, which is a third coordinate relative to the X and Y coordinates). The slide of FIG. 1 will then be read out in the order in which the objects were generated, as in the following example.                Airplane        Space Shuttle        Motorcycle        Main image: Car        FIG. 1        FIG. 2        FIG. 3        
This voice output alone cannot provide positional information about the visual document structure, so that it is difficult to understand the content of the document. For a presentation file created without much thought, the screen reader will read out the objects in the order in which they were generated. Typically, a user does not necessarily create a presentation file in conceptual order. Therefore, it is difficult to understand the content of the presentation file by causing the screen reader to read out the file.
To solve this, the first of the above-mentioned tools (PPT2HTML) has a function to sort along the Z coordinate as shown in the left part of FIG. 2. This tool allows changing the read-out order by selecting an object in a select box for which changing the read-out order is desired, and by moving the object in the select box using up and down buttons. However, this tool still requires changing the order for each object, which is a cumbersome task. Therefore, the ability of sorting a plurality of objects as a conceptual unit would reduce burdens on the user.
(2) Problem with Reading out Diagrams
In order for a diagram contained in a presentation file to be read out, a technique is typically used that replaces the diagram with text briefly describing the diagram, i.e., what is called alternative text (Alt text). However, inserting the Alt text into all objects in a presentation file is a laborious task.
For example, in the second of the above-mentioned tools (cita.disability.uiuc.edu/software/office/—Illinois Web Publishing Wizard), data must be entered to a wizard (a mechanism that facilitates operation on sophisticated application software by presenting questions to be interactively answered) on a screen as shown in FIG. 3 in order to insert the Alt text into all objects. This is a time-consuming task for a slide on which many complicated diagrams appear. Therefore, the ability of dealing with a diagram made up of a plurality of objects as a group would also reduce burdens on the user in inserting the Alt text.
(3) Problem with Acquiring Information about Positional Relationships between Objects
Conventional art does not allow effective use of graphics helpful for understanding the relationships between objects, such as “arrows,” unless the Alt text is inserted into the graphics.
(4) Applications of Presentation Files to the Field of Natural Language Processing
In analyzing a presentation file using a technique such as text mining, there has been no way other than to separately analyze text obtained for each object on a slide. If the above-mentioned dependency relationships among objects on the slide were found, text obtained from objects related to each other would be able to be collectively analyzed. Therefore, an improvement of accuracy of techniques such as text mining could be expected.