In recent years, presentations that utilize presentation software have become popular. Such presentations are usually conducted in the following manner: first, presentation software is operated by a personal computer and the like to create page-type rich or plain electronic documents (hereinafter referred to as “presentation documents”) that are displayed at the time of a presentation. Actual presentations that use these presentation documents are also conducted by operating presentation software with the personal computer or the like to display the presentation documents in a sequential manner by use of its slide show function.
Meanwhile, in order to ensure accessibility for people with hearing difficulties and for elderly people, there has been in recent years a movement to add subtitles to information originated in the form of voice. For example, with respect to information transmitted through broadcasts, there is a concrete goal of adding subtitles to all of the broadcast programs in which subtitles should be added by 2007.
Because of this background, it appears to be highly necessary to add subtitles to voice in a presentation as well. This is because although characters are present in presentation documents, they only provide fragmentary information in many cases and because a presenter does not always make a presentation according to his/her presentation document.
The method of adding such subtitles involves an automatic subtitle creation by use of voice recognition technology. However, the current voice recognition technology cannot create completely accurate subtitles. For this reason, editors must check and edit the result of voice recognition in the end. This kind of editing work has conventionally been performed by hand. To be more specific, editors manually amend the result of the voice recognition while listening to a playback of the corresponding voice.
However, this method has required a great number of editing processes, and has had a significant influence on the cost of the creation of subtitles. In addition, the efficiency of a manual editing depends largely on the level of skill of individual editors. Thus, an attempt to efficiently acquire the subtitle has increased the cost. Moreover, there is a report that long hours of manual operation place an enormous burden on editors.
Meanwhile, presentation software sometimes has a function of embedding caption information (hereinafter referred to as “speaker note”) in the pages of a presentation document. When the content of an exemplary presentation, conducted by a master speaker who uses the presentation document, is embedded in the presentation document as the speaker note, many people can readily imitate the presentation of the master speaker. In addition, when a presenter previously embeds the content of his/her presentation in the presentation document, the presenter can use it later as a note for his/her presentation. Conventionally, such speaker notes have also generally been taken by hand as in the case of the subtitle editing work. Moreover, the retrieval of desired scenes and words from the record of presentation has also been conducted see: see Patent Documents: Japanese Patent Laid-Open No. Hei07-182365; and Japanese Patent Laid-Open No. 2002-268667.
Specifically, the invention disclosed in Patent Document 1 creates retrieval files on the basis of, motion pictures, voice, pen-based input, mouse-based input and key-based input at an e-meeting, and one creates conference minutes while accessing the retrieval files and acquiring conference data.
Meanwhile, in the invention disclosed in Patent Document 2 a plurality of keywords are retrieved from text data in the presentation document or from voice in the presentation and registered. Page changing is then performed by voice input.
However, the conventional technologies including those disclosed in Patent Documents 1 and 2 have never established effective cooperation between voice recognition in a presentation and information about presentation documents.
For this reason, there has been a problem that the work of editing uncertain subtitles (hereinafter referred to as “subtitle candidates”) that have been obtained as a result of voice recognition turns out wasteful. In addition, since the creation of speaker notes has been performed independently of voice recognition, there has been a problem that the creation of speaker notes is not efficient and the unnecessary cost is required.
Furthermore, since the retrieval processing has also been conducted by focusing on individual media such as voice and text data, there has been a problem that satisfactory results cannot be provided efficiently.