The present invention relates to a text structure analyzing apparatus analyzing structure of a text described in natural language and an abstracting apparatus generating an abstract by selecting important elements from the text.
In recent years, with a rapid and wide use of electronic text, the necessity of technique of processing a text, namely, analyzing the structure of the text and selecting important sentences therefrom is increasingly required. In order to generate an abstract by selecting important sentences from the text, it is indispensable to analyze the structure of the text and evaluate the importance degree of each sentence constituting the text.
There is conventionally provided an automatic abstracting method disclosed in Japanese Laid-Open Patent Publication No. 2-112069 to evaluate an importance degree of each sentence by analyzing the structure of text and generate an abstract from an evaluated result thereof. The automatic abstracting method is as follows.
Of precedent sentences S including a key word whose character string is coincident with a character string of a key word included in sentences Sj constituting a text, a sentence closest to the sentence Sj is set as a parent sentence thereof. This operation allows the structure of the text to be expressed in a tree structure.
In the tree structure obtained by the operation, sentences included in a path between a head sentence (base node of tree structure) of the text and a last sentence of the text are regarded as important sentences. The chain of the important sentences are set as an abstract sentence.
However, the automatic abstracting method has the following problem:
(1) Merely the coincidence between the character strings of both key words is not enough to fully catch the connection between two sentences. In particular, when a text is constituted of a plurality of sub-topics, this tendency is conspicuous. That is, for example, when topics are switched from one to another, key words different from key words which have been on sentences appear many times.
(2) In determining the parent sentence of a sentence S, comparison between candidate sentences of the parent sentence is not made sufficiently in determining which of the sentences is best as the parent sentence. Thus, the conventional method is incapable of analyzing the structure of the text with high accuracy.
(3) The path between the head sentence of the text and the last sentence thereof may be comparatively long. Accordingly, when the sentence included in the path is selected, it is impossible to generate an abstract sufficiently concise.
It is an object of the present invention to provide a text structure analyzing apparatus analyzing structure of a text with high accuracy and an abstracting apparatus capable of obtaining an abstract highly accurate and concise.
In order to achieve the object, the present invention provides a text structure analyzing apparatus analyzing a connection between respective elements constituting a text and based on an analyzed result, indicating a structure of the text by means of a tree structure which represents the respective elements as nodes, comprising:
an element appearance position storing section dividing an inputted text into the elements and storing an appearance position relationship among the elements on the inputted text;
a relation degree computing section determining a precedent element of an attention element with reference to the appearance position relationship and computing a relation degree representing strength of a connection between the attention element and each precedent element;
an importance degree computing section computing an importance degree of the attention element, based on a relation degree between the attention element and each precedent element and an importance degree of a head element of the inputted text;
a structure determining section determining a tree structure of the inputted text by determining the precedent element having an optimum value as an importance degree of the attention element as a parent element of the attention element; and
an output section outputting the determined tree structure of the inputted text.
According to the construction, the parent element of each element in the tree structure of the inputted text is determined in consideration of the relation degree representing the strength of connection between the attention element and each precedent element and the importance degree of each element based on the relation degree. Thus, candidates of the parent element are compared with each other in much consideration of the connection between the two elements. Accordingly, it is possible to analyze the structure of the inputted text with high accuracy by setting only the element having a high degree of relation with the attention element as the parent element.
In an embodiment, the element is a sentence.
According to the construction, comparison between candidates of the parent sentence can be made in much consideration of the connection between two sentences. Thus, it is possible to analyze the structure of an inputted text with high accuracy by setting only a sentence having a high degree of relation with the attention element as the parent element.
An embodiment further comprises an important word recognizing section recognizing important words from words constituting the respective elements;
and important word weighting section weighting each of the recognized important words,
wherein the relation degree computing section has an important word comparing part for comparing a character string of an original form of each of the important words in the attention element with a character string of an original form of each of the important words in the precedent element to compute a relation degree between the attention element and the precedent element, based on a total value of weights of all the important words common to the attention element and to the precedent element and a number of all the important words in the attention element or a number of all the important words in the precedent element.
According to the construction, when important words common to the attention element and the precedent element are present, a relation degree corresponding to the total value of the weights of all the important words common to the attention element and the precedent element is given. In this manner, an optimum relation degree can be obtained according to the degree of connection between the attention element and the precedent element.
An embodiment further comprises an important word information storing section in which parts of speech to be recognized as the important words are stored,
wherein the important word recognizing section has a part of speech recognizing section for recognizing parts of speech in the respective elements; and a part of speech comparing section for comparing the recognized parts of speech and parts of speech to be recognized as the important words with each other to recognize words corresponding to parts of speech to be recognized as the important words from among words in the respective elements.
According to the construction, the important words are recognized based on a part of speech set in advance and stored. Thus, the important words can be easily recognized by consulting a dictionary.
An embodiment further comprises an important word recognizing section recognizing important words from words constituting the elements;
a meaning recognizing section recognizing meaning of each of the recognized important words; and
a concept system storing section storing a concept system for recognizing rank relationship between meanings of two of the recognized important words, an analogous relationship therebetween, and a part-to-whole relationship therebetween;
wherein the relation degree computing section has a determining section which regards that with reference to the concept system, one of the recognized important words in the attention element and one of the recognized important words in the precedent element have a semantic connection when the two important words have the rank relationship among meanings thereof, the analogous relationship therebetween, and the part-to-whole relationship therebetween to compute a relation degree between the attention element and the precedent element, based on a total value of weights of all the important words, having the semantic connection, in the attention element and the precedent element and the number of all the important words in the attention element or the number of all the important words in the precedent element.
According to the construction, an important word in the attention element and an important word in the precedent element are regarded to have a semantic connection when the meaning of the two important words have a rank relationship, an analogous relationship, and a part-to-whole relationship and the like. A relation degree is determined according to the total value of weights of all the important words, having the semantic connection, in the attention element and the precedent element. In this manner, an optimum relation degree can be obtained according to the degree of connection between the attention element and the precedent element.
The present invention provides an abstracting apparatus analyzing a connection between respective elements constituting a text and generating an abstract of the text by imparting an importance degree to the respective elements, based on an analyzed result and selecting the respective elements in the order from a higher importance degree to a lower importance degree comprising:
an element appearance position storing section dividing an inputted text into the elements and storing an appearance position relationship among the elements on the inputted text;
a specific word list generating section generating a list of specific words by recognizing the specific words from among words constituting a specific element and attaching the generated specific word list to a front of a head element of the inputted text;
a relation degree computing section determining a precedent element of an attention element with reference to the appearance position relationship in which the specific word list is set as a head element and computing a relation degree representing strength of a connection between the attention element and each precedent element;
an importance degree computing section computing an importance degree of the attention element, based on a relation degree between the attention element and each precedent element and an importance degree of the specific word list,
an element selection section selecting a predetermined number of elements in a descending order from an element having a highest importance degree obtained by computation; and
an output section outputting the selected predetermined number of elements as an abstract of the inputted text.
According to the construction, the abstract of the inputted text can be obtained by computing the importance degree of the attention element based on the relation degree representing the degree of connection between the attention element and each precedent element and selecting predetermined number of elements in descending order from an element having a highest importance degree. Thus, candidates of the parent element are compared with each other in much consideration of the connection between the two elements, and thus it is possible to select only an important element having a high degree of relation with the specific word list as the abstract. Therefore, according to the present invention, it is possible to generate the abstract which has high accuracy and is concise.
In an embodiment, the element is a sentence.
According to the construction, because comparison between candidates of the parent sentence can be made in much consideration of the connection between two sentences. Thus, it is possible to select only a sentence having a high degree of relation with the specific word list as the abstract.
An embodiment further comprises a specific word information storing section in which parts of speech to be recognized as the specific words are stored,
wherein the specific word list generating section has a part of speech recognizing section for recognizing parts of speech of words constituting an element representing a title; and a part of speech comparing section for comparing the recognized part of speech and the parts of speech to be recognized as the specific words with each other to recognize as the specific word a word corresponding to the parts of speech to be recognized as the specific word from among the words constituting the element representing the title.
According to the construction, specific words are recognized based on a part of speech set and stored in advance. Thus, the specific word can be recognized easily by consulting a dictionary.
The present invention provides an abstracting apparatus analyzing a connection between respective elements constituting a text and generating an abstract of the text by imparting an importance degree to the respective elements, based on an analyzed result and selecting the respective elements in the order from a higher importance degree to a lower importance degree, comprising:
an element appearance position storing section dividing an inputted text into the elements and storing an appearance position relationship among the elements on the inputted text;
a fragment dividing section dividing the inputted text into larger fragments than the elements;
a specific word list generating section generating a list of specific words in each of the fragments by recognizing the specific words from among words constituting a specific element and attaching the generated specific word list to a front of a head element of the inputted text;
a relation degree computing section determining a precedent element of an attention element in each of the fragments with reference to the appearance position relationship in which the specific word list is set as a head element and computing a relation degree representing strength of a connection between the attention element and each precedent element;
an in-fragment importance degree computing section computing an importance degree of the attention element in each of the fragments, based on a relation degree between the attention element and each precedent element and an importance degree of the specific word list,
a fragment importance degree setting section setting an importance degree of each fragment;
an entire importance degree computing section computing an importance degree of the attention element in the entire inputted text, based on an importance degree of the attention element in each fragment and an importance degree of the fragment to which the attention element belongs;
an element selection section selecting a predetermined number of elements in a descending order from an element having a highest importance degree, in the entire inputted text, obtained by computation; and
an output section outputting the selected predetermined number of elements as an abstract of the inputted text.
According to the construction, the importance degree of the attention element is computed in each of the fragments, based on the relation degree between the attention element and each precedent element. The importance degree of each fragment to which the attention element belongs is set. The importance degree (entire importance degree) of the attention element in the entire inputted text is computed, based on the importance degree of the attention element in each fragment and the importance degree of the fragment to which the attention element belongs. A predetermined number of elements is selected in descending order from an element having a highest entire importance degree, thus the abstract being generated. In this manner, after the importance degree of each element in the fragment is determined, the entire importance degree is computed in consideration of the importance degree in each fragment. Thus, for each fragment, it is possible to select only an element having a high degree of relation with the specific word list as the candidate of the abstract. Accordingly, even if the contents of the descriptions are varied every fragment, it is possible to generate the abstract of each fragment without omission, according to the importance degree of each fragment.
In an embodiment, the element is a sentence, and the fragment is a paragraph.
According to the construction, it is possible to select only an element having a high degree of relation with the specific word list as the abstract for each paragraph composing an inputted text. Accordingly, even if the contents of the descriptions of fragments are varied, it is possible to generate the abstract of each fragment without omission, according to the importance degree of each fragment.
An embodiment further comprises a fragment importance degree storing section classifying and storing importance degrees to be imparted to the fragments according to an appearance position of each of the fragments in the inputted text,
wherein the fragment importance degree setting section determines an appearance position of an attention fragment on the inputted text with reference to an appearance position relationship among the elements in which the specific word list is set as a head element and sets an importance degree of the attention fragment with reference to an appearance position of each of the fragments stored in the fragment importance degree storing section.
According to the construction, importance degrees of the respective fragments classified according to the appearance position thereof on the inputted text are stored in advance. Thus, for example, a high degree of importance is imparted to a head fragment which is supposed to contain many important elements. In this case, it is possible to generate an abstract by automatically selecting important elements preferentially from a fragment having a higher degree of relation with the head fragment.
The present invention provides a program recording medium in which a text structure-analyzing program is recorded to function:
an element appearance position storing section dividing an inputted text into elements and storing an appearance position relationship among the elements on the inputted text;
a relation degree computing section computing a relation degree representing strength of connection between an attention element and each of precedent elements;
an importance degree computing section computing an importance degree of the attention element, based on a relation degree between the attention element and each precedent element and an importance degree of a head element of the inputted text;
a structure determining section determining a structure of the inputted text by setting the precedent element having an optimum value as the importance degree of the attention element as a parent element of the attention element; and
an output section outputting the determined tree structure of the inputted text.
According to the construction, similarly to the next structure analyzing apparatus of the embodiment, candidates of the parent element are compared with each other in much consideration of the connection between the two elements. Accordingly, it is possible to analyze the structure of an inputted text with high degree of relation with the attention element as the parent element.
The present invention provides a program recording medium in which a text structure-analyzing program is recorded to function:
an element appearance position storing section dividing an inputted text into elements and storing an appearance position relationship among the elements on the inputted text;
a specific word list generating section generating a list of specific words by recognizing the specific words from among words constituting a specific element and attaching the generated specific word list to a front of a head element of the inputted text;
a relation degree computing section computing a relation degree representing the strength of connection between an attention element and each precedent element;
an importance degree computing section computing an importance degree of the attention element, based on a relation degree between the attention element and each precedent element and an importance degree of the list of specific words;
an element selection section selecting a predetermined number of elements in a descending order from an element having a highest importance degree obtained by the computation; and
an output section outputting the selected predetermined number of elements as an abstract of the inputted text.
According to the construction, similarly to the abstracting apparatus of the embodiment, it is possible to select only an important element having a high degree of relation with the specific word list as the abstract. Therefore, it is possible to generate the abstract which has high accuracy and is concise.
The present invention provides a program recording medium in which a text structure-analyzing program is recorded to function:
an element appearance position storing section dividing an inputted text into elements and storing an appearance position relationship among the elements on the inputted text;
a fragment dividing section dividing the inputted text into larger fragments than the elements;
a specific word list generating section generating a list of specific words by recognizing the specific words from among words constituting a specific element in each fragment and attaching the generated list of specific words to a front of a head element of each fragment;
a relation degree computing section computing a relation degree representing strength of connection between the attention element and each precedent element in each fragment;
an in-fragment importance degree computing section computing an importance degree of the attention element, based on a relation degree between the attention element and each precedent element in each of the fragments and an importance degree of the list of specific words;
a fragment importance degree setting section setting an importance degree of each of the fragments;
an entire importance degree computing section computing an importance degree of the attention element in the entire inputted text, based on an importance degree of the attention element in each fragment and an importance degree of the fragment to which the attention element belongs;
an element selection section selecting a predetermined number of elements in a descending order from an element having a highest importance degree, in the entire inputted text, obtained by computation; and
an output section outputting the selected predetermined number of elements as an abstract of the inputted text.
Thus, similarly to the abstracting apparatus of an embodiment, for each fragment greater than the respective elements, it is possible to select only an element having a high degree of relation with the specific word list as the candidate of the abstract. Accordingly, even if the contents of the descriptions of fragments are varied every fragment, it is possible to generate the abstract of each fragment, according to the importance degree of each fragment.