1. Field of the Invention
The present invention relates to a technology of creating data such as a caption data based on information such as a voice, and more particularly relates to a technology of editing data such as a caption created by a computer.
2. Description of Background
For the purpose of securing accessibility to information to be delivered through broadcasting, a goal has been set that captioning will be appended by the year 2007 to every possible broadcasted program. There will also be increasing need, in the near future, for captioning motion pictures to be delivered on the Internet. In dealing with such a situation, a lot of research has been conducted on the appending of captions to broadcasting and motion pictures.
Prior art contains some such research and a few examples are referred to herein specifically for ease of understanding. Some prior art that points to such research provides for a system for assisting an expert who works on a transcription for captioning. Two examples can be provided. A first example, is described in Japanese Patent Laid-open No. 2003-216200 (Pages 9, 10, FIG. 6), hereinafter referred to as “Patent Document 1”, and a second example is described in Japanese Patent Laid-open No. 2003-223200 (Page 8, FIG. 6), hereinafter referred to as “Patent Document 2”. According to Patent Document 1, a transcription work for a caption is assisted by means of a specific reproduction operation, while, in Patent Document 2, the work is assisted by means of changing a speech rate.
Another one is about a method for automatically producing a caption by employing a voice recognition technology. Use of this method can eliminate a transcription work for a caption by an expert. Hence, this method is beginning to be expected. However, by use of the current voice recognition technology, it is impossible to create a perfectly correct caption. Therefore, at the end, the expert has to do a work involving checking and editing on a result of voice recognition. Thus the work requires a large number of steps to be completed.
An example of such an editing work will be described with reference to FIG. 23. Here, as depicted in the drawing, suppose there is a voice saying “Imamadeno torikumiga ondemando bijinesu wo jitsugensuru uedeno kateini sugimasen” (phonetically written according to the Japanese phonetic system; this means “Activities until now are a part of the processes in bringing off an on-demand business deal”), and a voice recognition result is obtained as shown in the drawing.
An editor, then, checks the voice recognition result to find out errors while hearing the voice from the beginning. For example, suppose the editor found out that “on-deando” (hereinafter, what is described inside the double quotation marks is a Japanese sentence or word, each being phonetically written according to the Japanese phonetic system, or a symbol, unless otherwise stated) on line 5 should be “on-demando”. At this moment, the editor firstly stops the voice. Then, the editor points the line 5 with a mouse, and moves a keyboard focus to correct “on-deando” to “on-demando”.
Here, if the editor forgot how to correct the incorrect word “on-deand,” even if the keyboard focus had been moved to the line 5, he/she hears the voice once more, and then corrects “on-deand” To “on-demand.”
As described above, a great number of research works have since long been conducted on appending captions to voices. However, the existing technology has the problems as described below.
The first problem is that the existing technology heavily depends on highly skilled labors. A transcription work for a caption is to convert a voice into characters through spelling out of the voice, while hearing the voice. It is, therefore, not possible for a low skilled labor to do the work. Moreover, a work for creating a caption through editing of a voice recognition result is also difficult for the low skilled labor, since the work requires one to quickly carry out the processes of recognizing an error in the voice recognition result, and of determining a correct character string, while hearing the voice. Meanwhile, employing the highly skilled labors will result in a problem of pushing up the labor cost.
The second problem is that not only this causes inefficiencies but the work may be flawed operationally. It is convenient for one to use a mouse since he/she can swiftly operate the mouse in pointing an arbitrary place on a screen. Meanwhile, it is convenient to use a keyboard for inputting characters on the screen. However, on a transcription work for a caption, the reproduction and stop operations for a voice are done with the mouse, while inputting the caption is carried out with the keyboard. In addition, on an editing work for a caption, pointing an incorrect part is done with the mouse, and inputting a correct character string is done with the keyboard. Therefore, when the mouse and the keyboard are concurrently used, a period of time spent in moving an operator's hand back and forth between the mouse and the keyboard will become an overhead.
Incidentally, it has since long been that a part where a voice is being reproduced is not coupled with a part where a transcription work or an editing work for a caption is performed. Furthermore, reproduction of the voice during inputting of characters with the keyboard influences concentration on the inputting. Meanwhile, when an editor forgot a content to be corrected, it was necessary to search through his/her memory, or to explicitly issue a reproduction command. It has, thus, never ever been possible to carry out efficient operations.