Since speech is transmitted via the air, not many speech channels are simultaneously available in one situation. For example, normally, only one speech channel is publicly acceptable in a meeting or a lecture, and any other speech communication using a channel besides this speech channel is referred to as “idle conversation” and is often deemed to be inappropriate.
If a meeting or a lecture is deemed to have a role in transmission of information, since such “idle conversation” is a “noise” for the only one speech channel, the idle conversation needs to be avoided. However, if a meeting or a lecture is deemed to be an opportunity to recall ideas to inspire participants or audience to obtain new ideas or viewpoints, such “idle conversation” could also be a valuable opportunity to recall ideas. Therefore, idle conversations must not always be dismissed. For example, at a question-and-answer session in an academic conference presentation, a heated discussion is often carried out on a topic slightly off the main contents of the presentation. There are communities operating based on these concepts. These communities actively adopt such communication in which a non-speech channel such as chatting is used in parallel with a speech communication. Examples of such communication include WISS (Workshop on Interactive Systems and Software) disclosed in Non-Patent Document 1.
In other fields besides a meeting, communication using a video as a medium is carried out. Even prior to spread of the Internet, such communication in which close friends away from each other talk on the phone while watching the same program on TV had often been carried out. In the earliest days of personal-computer communication or the Internet, communication through chatting is started. In this communication, heated discussions about a television program as a medium were also seen. These days, for example, dedicated threads are posted on major bulletin boards (Non-Patent Document 2), which is a quite common communication means on the Internet. Further, in recent years, since videos are available on networks, application examples that are free of temporal restrictions of television programs or radio programs are being developed (Non-Patent Document 3).
In the communication using a combination of a video and chatting, the video is used as an exclusive channel and chatting as a secondary channel. Thus, it is clear that this communication has the same configuration as that of the above communication at a meeting/lecture.
In recent years, a model including: a media stream using a speech or a video available through a primary channel; and one or a plurality of communication streams available through a secondary channel associated with the primary channel in parallel is becoming widespread in various scenes.
One of the problems with this communication model is that, when a user is too focused on the communication in the secondary stream, the user may fail to watch or hear part of the primary media stream. In this case, it would be convenient if the user could easily rewind the primary media stream to where the user started to be too focused on the secondary stream.
For example, a situation where a plurality of users are enjoying chatting while viewing video contents “prime minister's resignation announcement” will be hereinafter considered. In this situation, if the prime minister uses an abusive language, the users may break into heated discussions on abusive languages and continuously post messages listing abusive languages made by the past prime ministers and politicians. In this case, since the users are continuously posting messages on a topic (past politicians' abusive languages) somewhat off the topic, which is “the current prime minister's abusive language” in the original contents “prime minister's resignation announcement,” attention of the participants of the chatting is temporarily distracted from “the prime minister's resignation announcement.” After enjoying the heated discussions through chatting for a while, when the users pay attention to “the prime minister's resignation announcement” again, the users notice that the topic has become completely different. To continuously view the contents, the users need to read and understand what the current topic is. If there were a system for easily rewinding the contents to a scene immediately after the scene of “the prime minister's abusive language,” which led to the series of discussions through chatting, the users could quickly view what they missed and catch up with the topic more easily. With such system, the users could efficiently browse the entire “prime minister's resignation announcement” without missing anything and could also enjoy chatting fully.
In the communication model assumed herein, the communication in the secondary channel always originates from the communication in the primary channel. In the above example, the series of discussions about the past politicians' abusive languages through chatting originates from the scene of “the prime minister's abusive language” in the video of “the prime minister's resignation announcement.” Thus, if a technique that uses the communication in the secondary channel and identifies a single media stream point in the primary channel that led to the series of discussions is used, the need can be met.
Examples of an existing technique that can be used for such purpose include a speech indexing technique.
Non-Patent Document 4 is an example of the indexing technique handling speech and a secondary channel associated with the speech. The system disclosed in this document uses a television program and text of live chatting about the program. The system executes indexing as follows. Namely, first, the system measures the number of messages posted through chatting per unit time. If the number of messages is particularly large at a certain time, the system regards that an event on the television program side has caused a large response immediately before the time. Next, the system analyzes these messages by referring to the vocabulary and the like appearing in the messages. In this way, the system extracts “the degree of excitement” and “the degree of disappointment.” Thus, by extracting the times at which events occur on the program side and the messages posed through chatting corresponding to the events, the system can associate certain times in the program with the individual messages posted through chatting. Namely, the system can execute indexing on a certain television program portion corresponding to a certain chat message.
Patent Document 1 is another example of the speech indexing technique. The cross-indexing system of text and speech disclosed in this document roughly operates as follows. First, the system allocates a topic label to the entire or part of text. Next, the system calculates the probability that a previously given keyword appears in each of the topics of all or part of inputted text. Finally, a speech recognition means estimates the likelihood that the keyword appears in an arbitrary section of inputted speech. By combining this likelihood with the probability that the keyword appears in each of the topics, the system estimates correlation between the text and the speech.
While Non-Patent Document 5 adopts a slightly different system, Non-Patent Document 5 is another example of a speech indexing technique based on a speech summarization technique. Based on the meeting indexing system disclosed in this document, a speech recognition technique is used to convert speech made in a meeting into text. The system uses a concept vector previously given to each term and determines whether a collection of concept vectors formed by a collection of terms included in text obtained as a result of speech recognition executed on the speech is similar to that of another speech. The system uses the results as a reference to divide the terms according to topic. Thereafter, based on the similarity among topics, the system re-establishes the topic transition in the entire meeting in the form of a tree. Each node of the tree represents a collection of speeches that belongs to a group of topics. Thus, by using this tree-shaped topic network, the system can indicate the first speech whose topic is identical to that of a certain speech in a meeting.
The technique of Non-Patent Document 5 is mainly used for speech summarization. Since a topic transition tree obtained by analyzing speeches is outputted, it is only possible to provide a link among the texts that appear in the topic transition tree, that is, from one part of speech data to another part of the speech data. However, the topic transition tree is established based on a text string obtained by a speech recognition process and the number of streams leading to a text string may be plural. Thus, by simultaneously inputting both a primary media stream and a secondary language communication channel, the technique can be extended to cross-indexing between the language communication carried out in the secondary channel and the primary media stream.
Non-Patent Documents 6 to 11 are documents introducing techniques and the like applicable to the present invention. Contents of the documents and relations of the documents to the present invention will be described at relevant portions of the exemplary embodiments.    Non-Patent Document 1:    “System for supporting a meeting organized by participants: WISS Challenge,” computer software (Japan Society for Software Science and Technology), 2006, Vol. 23, No. 4, pp. 76-81    Non-Patent Document 2:    “Live Board,” http://ja.wikipedia.org/wiki/liveboard, free encyclopedia Wikipedia    Non-Patent Document 3:    “Nico Nico Douga,” http://ja.wikipedia.org/wiki/niconicodouga, free encyclopedia Wikipedia    Non-Patent Document 4:    Miyamori, et al., “View Generation of TV Content using TV Viewer's Viewpoint based on Live Chat on the Web,” DBSJ (Database Society of Japan) Letters, Vol. 4, No. 1, pp. 1-4, 2005    Non-Patent Document 5:    Bessho, et al., “Meeting Speech Indexing System Based on Topic Structure Extraction,” IEICE (Institute of Electronics, Information and Communication Engineers) journal D Vol. J91-D No. 9 pp. 2256-2267, 2008    Non-Patent Document 6:    Salton, et al. (1975), “A Vector Space Model for Automatic Indexing,” Communications of the ACM, vol. 18, nr. 11, pp. 613-620    Non-Patent Document 7:    NEC, “Speech Recognition Software CSVIEW/VisualVoice,” http://www.nec.co.jp/middle/VisualVoice/, as of 2008 Sep. 19    Non-Patent Document 8:    Rosenfeld, “A maximum entropy approach to adaptive statistical language modeling,” Computer, Speech and Language 10, pp. 1-37, 1996    Non-Patent Document 9:    Kuhn and de Mori, “A cache-based natural language model for speech recognition,” IEEE Transaction PAMI, Vol. 12, No. 6 pp. 570-583, 1990    Non-Patent Document 10:    Wessel, et al., “Confidence measures for large vocabulary continuous speech Recognition,” IEEE Transaction on Speech and Audio Processing, 2001, vol. 9, No. 3 pp. 288-298    Non-Patent Document 11:    Isotani, et al., “Spontaneous Speech Recognition Technology and Its Applications,” NEC Technical Journal Vol. 58 No. 5/2005, pp. 30-32    Patent Document 1:    Japanese Patent Kokai Publication No. JP2000-235585A