The quantity of information expressed in natural language, and stored in computer-accessible form, has grown, and will likely continue to grow, at an exponential rate.
In almost any area of endeavor, merely gaining access, to the pertinent information, is no longer a principal issue. In its place, the more difficult problem has become being able to evaluate the accessible information, within a manageable timeframe. Accordingly, there is a great, and growing, need for automated methods that can summarize natural language information, prioritize the presentation of natural language information, or accomplish some combination of both.
While computer-based methods already exist, that provide a capacity to summarize natural language, the known methods suffer from severe limitations. For example, the following methods have been used: statistical, typical-element, and template-based, as well as methods that employ a combination of the foregoing.
A statistical method might, for example, summarize product reviews as “Acme makes a good pipe wrench,” because the phrases “Acme,” “pipe wrench,” and “good” appear, frequently, in close proximity to each other.
A typical-element method might summarize a news item about an earthquake by looking for a number between 3 and 10, and then use that number to fill in a summary template.
Because their understanding, of the natural language to be summarized, is so shallow, such methods can be easily misled. The result is a misinterpretation of the input information, when such information would never confuse a human reader.
For example, it is clearly incorrect to summarize a review that states “only a fool would say that an Acme pipe wrench is good,” as saying “Acme pipe wrenches are good,” even though the word “good” appears in the same sentence as the phrases “Acme,” and “pipe wrench.”
Similarly, typical-element methods are easily confused. For example, if an earthquake report indicates that a building was damaged 9.5 miles from the slipping fault line, it would not be correct to summarize this report as saying “The magnitude of the earthquake was 9.5 on the Richter scale.”
Accordingly, there exists a need for robust summarization methods that can consistently, and accurately, summarize, over a wide range of natural language input.