It is estimated that large enterprises are witnessing tremendous yearly growth in information handling (more than 100%), based on the products and services that they offer. Processing significant parts of the information that is generated is important, so as to facilitate sound and timely decision making with regard to, e.g., policies and processes. Enterprise information systems usually connect to various sources across multiple internal and external third-party application systems.
If enterprises wish to adapt in real-time, they may need to monitor, integrate, analyze, and respond to high volume information from diverse applications and information sources. The amount of information to be considered can be hundreds of terabytes, depending on the granularity levels. One may encounter unstructured information from raw events on a particular enterprise metric from one of more sources, as well as aggregated information such as parts sales, ecommerce transactions, customer complaints, warranty claims, and the like.
Techniques such as filtering, integration, and summarization have been used for managing, searching and processing high volume enterprise information. Filtering is discussed in Y. Zhang, “Using Bayesian Priors to Combine Classifiers for Adaptive Filtering,” in the Proceedings of SIGIR 2004. Integration is discussed in, e.g., Y. Arens et al., “Query reformulation for dynamic information integration,” in the Journal of Intelligent Information Systems, 1996. Summarization is discussed in, e.g., M. Hu and B. Liu, “Mining and Summarizing Customer Reviews,” in the Proc. of ACM SIG KDD, 2004. These are powerful techniques for managing high volume enterprise information. Nevertheless, the continued growth in the volume and complexity of information being handled requires continued progress.
It would thus be desirable to overcome the limitations in previous approaches.