Acquiring knowledge of information contained in a speech corpus requires listening to the corresponding complete sound signal. With a large corpus, this operation can be very time-consuming.
Techniques for temporal compression of a sound file such as acceleration, suppression of signal portions of no utility (for example pauses), etc. do not save much time given that the content is no longer intelligible once the compression factor reaches a value of 2.
Known techniques make it possible to transcribe an audio signal into text. The text obtained in this way may then be displayed, e.g. on a computer screen, and read by a user. Since reading text is faster than listening, users can thus obtain information they deem pertinent more quickly. However, sound also carries information that it is difficult to quantify and to represent by images. Such information includes the expressiveness, gender and personality of the speaker. The text file obtained by this method does not contain this information. Moreover, automatic transcription of natural language generates numerous transcription errors and the text obtained may be difficult for the reader to understand.
Patent application FR 08 54340 filed on Jun. 27, 2008 discloses a method of displaying information relating to a sound message in which the sound message is displayed in the form of a chronological visual representation and in which key words are displayed in text form as a function of their chronological position. The key words displayed give viewers information about the content of the message.
That method makes it possible to assess the gist of a message by visual inspection while offering the possibility of listening to the whole message or part of the message.
That method is not suited to a large sound corpus. The number of words displayed is limited, in particular by the size of the screen. Applying that method to a large corpus makes it possible to display only a restricted number of words that are not representative of the content as a whole. Consequently, it does not give a real insight about the content of the corpus.
A zoom function makes it possible to obtain more details, in particular more key words, over a smaller portion of a message. To assess the gist of the message the user must scan the whole of the document, i.e., zoom in on various parts of the content.
Applying that zoom function to a large number of sections of the content is time-consuming and laborious because it requires many manipulations on the part of the user.
Moreover, if the user wishes to view a previously-viewed section, at least some of the zooming operations previously-effected need to be repeated.
Thus navigating in a large voice content is not easy.
There is therefore a need to be able to access, quickly and simply, pertinent information of a large voice content.