1. Field of the Invention
The present invention relates to determining changes in an amount of word use over time for words within text documents and to summarizing text documents having at least some of the changes. More specifically, embodiments of the invention relate to an apparatus and method for determining changes in an amount of word use for words within text documents, determining a time frame when the changes in the amount of word use occurred, presenting information with respect to the changed amount of word use and presenting a visual graph summarizing at least portions of the text documents including words determined to have the changed amount of use.
2. Introduction
Companies often collect a large corpus of unstructured text data continuously over time. The unstructured text data may be, for example, e-mail messages, transcriptions of customer comments, transcriptions of phone conversations, physical mail, medical records, news feeds, blogs, or the like. Each item of data may be generated or collected at a particular point in time. Managers may wish to learn about the contents of the data and the changes that occur over time, including when and why, such that they may understand and/or act upon the information contained within the data. Because of the large volume of data, it is too expensive and difficult to individually read each document in the corpus, determine the changes and summarize the contents of the data. Further, the data's lack of structure makes conventional tools insufficient to facilitate the understanding of the contents of the data, such as, for example, conventional statistical analysis tools.
Existing tools that perform automatic summarization of textual data typically provide textual output only. While some tools provide visual graphics with respect to word frequencies, they do not provide any other visually graphic information.
Thus, there is a need for a tool that facilitates the understanding of changes in a large volume unstructured text corpus and that takes advantage of human cognitive visualization capability.