High volume transaction businesses depend upon their back office system's ability to provide fast and accurate reporting. Since the transactional data could potentially be millions of records, an application is needed that is able to aggregate the transactions and report on them in seconds.
Many data processing applications today are responsible for hundreds of millions or even billions of transactions in total with millions of new transactions being added every day. With that volume of information flowing into a system it is important to clearly understand the purpose of the information as it applies to each potential type of user. Different users require different subsets of information, different tabulation of the information and even different performance in reporting. When the Chief Financial Officer (“CFO”) of a company runs a report on the revenue by Day across 200 million transactions for a month, the system should access only those pieces of “perfected” information necessary to process his request in the most efficient way possible (he is an officer of the company and has very little time or tolerance for waiting). When a network designer wants to determine the performance of a particular piece of equipment for an hour for the previous hour to troubleshoot a problem, the network designer's path through the information and the resources allocated (either in ad-hoc or as a part of a capacity plan) is likely to be very different than that of the CFO. Access to information is rarely so much a matter of equality as it is a matter of planning and resource allocation within the real-world needs of an information technology (IT) environment.
A problem to be solved is: how does a system responsible for storing and retrieving such high volumes of information match the needs of the users with the resources available and how are those available resources identified in real-time as the most effective and/or necessary resources to be leveraged in both storing and retrieving information from multiple sources in multiple formats.
High-end database vendors, such as Oracle, attempt to be all things to all people by focusing on indexes, automated aggregation and caching strategies and an ever-increasing dependence on processing power. Indexes are very important for database processing. Indexes increase retrieval speeds significantly. However, when the volumes of data get as large as hundreds of millions of records, an index's ability to deliver rapid responses (less than 10 seconds) diminishes. Even with very effective indexes, the time required to summarize 200 million records ad-hoc is prohibitive. In some cases, to deal with this limitation, the databases will attempt to either aggregate information automatically (based on the queries that have been processed) or to cache the information (based on the most current information either written or retrieved). These techniques are useful in some cases, but fall short when faced with the huge volume of data being contemplated herein.