When sound, e.g. human speech, is transcribed to text automatically by means of a speech recognition system, it is generally and easily possible to associate each word or even smaller lexical subunit, referred to as a text datum in the following, with the corresponding sound segment (also referred to as a sound datum), for instance by automatically including timing data derived from the sound data in the text file which is produced by the speech recognition system. The timing data can then be used to directly access a text datum corresponding to a given sound datum, and vice versa. Such an association is in particular required for commonly known features such as synchronous playback, wherein a segment of text (text datum) such as a word or a syllable corresponding to a currently played sound segment is shown to a user, for instance by highlighting the text segment in question on a display. Such a feature is especially useful for correction of transcriptions established through speech recognition as well as for review and quality assurance.
However, when sound is transcribed manually, which is the case frequently due to the well-known imperfections of current speech recognition systems, e.g. when dealing with sound data of poor quality or a highly specialized jargon, such an association is generally not available automatically. Therefore, in the prior art synchronization of text and sound has to be done manually by marking sound segments with a precision of the order of a few milliseconds and subsequent entering the corresponding text. Such an approach is very time consuming, thus representing an important matter of expenses. Nevertheless, it constitutes an important feature of transcription for further analysis, e.g. in the field of psychology, marketing, etc. A similar approach has been published by Bainbridge, D., and Cunningham, S. J.: “Making oral history accessible over the World Wide Web”, History and Computing, vol. 10, no. 1-3, pp. 73-81 (1998).
Thus, there is a need in the art to be able to cost-effectively synchronize sound and text in connection with the manual transcription of sound data.
It is the object of the present invention to provide a method for synchronizing sound data and text data, said text data being obtained by manual transcription of said sound data during playback of the latter, which obviates the above-mentioned disadvantages. It is also an object of the present invention to provide a method for synchronized playback of sound data and corresponding text data, which incorporates the inventive method for synchronizing sound data and text data, thus obviating the common disadvantage of the prior art of synchronous playback being exclusively reserved to systems using speech recognition. Furthermore, the present invention has for its object to provide a system adapted to translate into action the respective inventive methods mentioned above.