The present invention relates to text mining. In particular, the present invention relates to extracting key terms and associated key terms for data mining purposes.
Customers often send companies emails that ask questions, suggest new features or improvements, explain how a product is being used, report problems, praise or criticize customer service, and the like. Similarly, Internet message boards, webblogs, focus groups, newsgroups, and the like also generate text, which discuss similar product and customer service subjects.
The information contained in these emails and other text sources can be very useful. The information, for example, can permit companies to measure consumer satisfaction, predict market trends, detect problems with products, classify documents for research, and the like.
However, some large companies may receive thousands of emails per day on many subjects. The volume of information available from Internet and other sources is also very large. Therefore, it can be difficult to retrieve useful information from the voluminous amount of data received.
Text mining is a variation of a field called “data mining,” which tries to find interesting patterns from large databases. A typical example in data mining is using consumer purchasing patterns to predict which products to place close together on shelves, or to offer coupons for, and so on. For example, a customer buying a flashlight is likely to also buy batteries.
Text mining is an extension of data mining. The difference between regular data mining and text mining is that, in data mining, patterns are analyzed from structured databases of facts; in text mining, patterns and/or associations in unstructured text are analyzed to extract useful information. Also, databases are designed for programs to process automatically; text is written for people to read. Presently, there are limitations in how well computer programs can be designed to “read” and understand text as people do. There are also limitations and costs associated with people reading large volumes of text in order to extract useful information.
Different forms of text mining utilizing a computer can include keyword searches and various relevance ranking algorithms. While these methods can be effective, a user must usually spend significant amounts of time to effectively identify and sort relevant documents. Thus, text mining can be time-consuming, tedious, and expensive.
As a result, an improved system and method that addresses one, some, or all of the issues associated with present text mining would have significant utility.