Conventionally, various techniques are well known in order to improve efficiency of the transcription work. For example, there is a well known technique that each of plural character strings constituting voice text data, which is obtained by performing a voice recognition process on the voice data, and a position of each of the character strings in the voice data (playback position) are displayed on a screen so as to be associated with each other. In the technique, when a character string on the screen is selected, because the voice data is played back from the playback position corresponding to the selected character string, a user (transcription worker) selects the character string, and the user corrects the character string while listening to the voice data.
In the technology, it is necessary that each of the plural character strings constituting the voice text data and the playback position of the voice data are displayed on the screen while correlated with each other, which results in a problem of a complicated configuration of a display control. Accordingly, from the viewpoint of simplifying the configuration of a transcription method, transcribing an audio file without any restriction while listening to the voice data is preferable to correcting the voice recognition result.
In this case, the user is forced to repeatedly temporarily stop and rewind while the transcribing. When the user resumes transcribing after the temporary stop, it is desirable that the playback is resumed from the exact position at which the transcription is completed.
Therefore, it is conceivable that a position, rewound by a predetermined amount from the position of the voice data in the temporary stop, is set to a playback starting position indicative of a position the playback starts from.
However, because a difference between the position of the voice data in the temporary stop and the position at which the transcription is completed in the voice data may not always be constant, it is difficult that the playback of the voice data is resumed from a position at which the transcription is completed. Therefore, the user frequently adjusts the position while repeating the rewind and fast-forward operation for the voice data, which results in a problem of reduced work efficiency for the user.