In virtually any large enterprise, there is an enormous amount of stored information, predominantly in the form of text. The amount of free-form or unstructured text data is typically very large when compared with structured data in databases. For example, free-form text data, on average, accounts for about 80% of the stored information and can frequently double every year. Structured data, on the other hand, accounts for only about 20% of the stored data. Much of the free-form text information, particularly in organizations such as legal, supply chain, human resources (HR), and the like, is contained in numerous large documents. Comprehending the terms and semantics to, for example, locate concepts of interest within a document requires a painstaking effort. Although electronic storage of documents simplifies the process of browsing through documents, it is difficult and time-consuming to browse through large volumes of text to understand and quickly locate the key semantic concepts of interest.
Most word processing software provides a mechanism for searching for individual terms but does not enable extraction of key semantic concepts from a large document. Recent advances in information extraction and text-mining technologies, however, provide mechanisms for extracting key semantic concepts. For example, one such text-mining engine that can extract semantic concepts is produced by ClearForest. Another example is APR Corporation's Smart Logik product.