With recent developments in networks, typified by the Internet, information can be easily extracted from the network. In particular, by the structurization of information based on a HTML (Hypertext Markup Language) description used in the Internet, excellent browsers (application software for retrieving information) in the operability have become widespread. Due to this, in personal computers, there have been used various systems in which not only document information but also a voice and a moving video picture are structured and stored in an input apparatus. In this case, the structurization means to form a link structure or a hierarchical structure.
However, there exists information such as image data, which is difficult to be structured. In many cases, such information is handled as one batch file. A large amount of time must be required to structure the internal of the image.
Conventionally, for structuring the existing document described on paper, the document must be converted to characters by, for example, an optical character recognition (OCR) apparatus, or must be input from a keyboard. This requires the manpower in the operation. The existing document can be imaged by a scanner. However, there is a difficulty in dividing the image into some portions to be structured.
In recent years, there have become widespread information terminals e.g., personal computers, having a function of creating a document to which voice data is appended to link voice data to the document (hereinafter referred to as document with voice data).
The flow of the conventional procedure for creating the document with voice data will be explained with reference to FIG. 1.
FIG. 1 is a flowchart showing the flow of the conventional procedure for creating the document with voice data.
First of all, a document to which voice data should be appended is created by the input operation of the keyboard or the document is scanned by a scanning apparatus such as the scanner. Then, the document is displayed on a screen (S101).
Next, the display on the screen is changed to a voice symbol table, and a voice symbol linking to voice data is selected by a mouse clicking operation (S102).
Voice data, which is entered in the system in advance, can be used. Or, voice data newly input by a microphone can be used.
Next, the display on the screen is changed to the object image again, and the voice symbol is pasted to the document on the screen by dragging an icon of the displayed voice symbol using the mouse (S103).
If there are other voice symbols to be added, the operations in S102 to 103 are repeated (S104).
When the paste of all voice symbols to the object image is ended, the document with voice data is completed, and the document creating operation is ended.
In the conventional the apparatus for creating the document with voice data, however, an operator must repeat the operations in S102 to S103 by the number of times corresponding to the number of voice symbols to be added. Therefore, if a large number of voice symbols are present, it takes considerable time to complete the document with voice data.