There are cases where two types of text that target the same event are generated by different generation processes. Assume that one type of text is a first text, and that a set of a plurality of first texts is a first text set. Assume also that the other type of text is a second text, and that a set of a plurality of second texts is a second text set. In the case where there are two such types of text, it is useful to specify, within each first text constituting the first text set, portions in which is described content that should be described in a corresponding second text.
For example, in a call center, speech recognition is performed on phone call speech, and a plurality of texts are obtained as a result. Consider the case where the obtained texts are first texts and the set of first texts is a first text set. In many call centers, the operator derives the gist of the phone call and prepares a customer memo constituted by text. Accordingly, at many call centers, there are sets of customer memos corresponding to first texts in the first text set. Because these customer memos are generated by a different generation process to the first texts, while targeting the same event as the first texts, these customer memos can be viewed as second texts, and a set of customer memos can be viewed as a second text set.
Under such circumstances at a call center, it is important to specify, within each speech recognition text, portions forming the gist of the phone call that should be written in a corresponding customer memo. Being able to specify portions, within each speech recognition text, forming the gist of a phone call that should be written in a corresponding customer memo enables an analyst to examine only the important portion, by highlighting that portion, for example, and an improvement in analysis efficiency is achieved. This also subsequently enables processing such as text mining and searches focused on the portion forming the gist, and, further, the preparation of summaries utilizing the gist of each speech recognition text.
Alternatively, in the case where, for example, a set of research papers is considered to be a first text set, there may be presentation material corresponding to each research paper in the set. In this case, the set of presentation material can be viewed as a second text set. It is then important to specify, from within each research paper (first text set), important portions that should be written as presentation material.
Being able to specify, from within each research paper, portions that should be written as presentation material enables readers to view the material efficiently, by highlighting those portions, for example. This case is also able to facilitate subsequent processing such as text mining, searches and summary preparation, similarly to the case mentioned earlier where a text set obtained by performing speech recognition on phone call speech is viewed as the first text set.
Also, consider the case where summary documents are prepared by two different people respectively summarizing a given document set. In this case, the set of summary documents summarized by one person can be viewed as a first text set, and the set of summary documents summarized by the other person can be viewed as a second text set.
Even under circumstances where two different people respectively prepare summaries, it is important to specify, from within each first text constituting the first text set, portions that should be written in a corresponding second text. Being able to perform such specification enables portions that are considered important by both people to be determined by examining the specified portions, and also enables analysis focused on the determined portions. It also becomes possible to analyze differences between the summaries of both people by examining portions other than the determined portions.
As for the technique of specifying portions in one text that are described in another text, assuming two texts as inputs, a technique of aligning texts is known. With an alignment technique, one text is viewed as a string of segments constituting a block of homogeneous information. Then, with this alignment technique, it is determined whether a segment corresponding to the content of a segment in one text does or does not appear in any of the segments in the other text.
For example, Patent Document 1 and Non-patent Document 1 disclose specific examples of alignment techniques. Patent Document 1 discloses an alignment technique that efficiently uses a diversity of lexical information and knowledge information as a key to alignment. With the alignment technique disclosed in Patent Document 1, original language is aligned with a translation thereof.
With the alignment technique disclosed in Non-patent Document 1, the topics of paragraphs to which sentences belong is firstly determined, and macro-alignment between paragraphs utilizing the topics is executed, as a preliminary step to alignment in sentences that is ultimately to be executed. Alignment in sentence units is executed on pairs of aligned paragraphs. With the alignment technique disclosed in Non-patent Document 1, the unabridged version of an encyclopedia is aligned with the abridged version.
Therefore, assuming that a first text and a second text corresponding thereto are the inputs of the alignment technique disclosed in Patent Document 1 or Non-patent Document 1, portions (segments) described in the second text are specified from within the first text.