Data and information management in a business or enterprise usually involves one or more transactional databases (also known as Operational Data Store or ODS) where business transactions are recorded, and decision support functionalities (in the business context, referred to as business intelligence or BI) use a separate instance of the database (including a data mart or a data warehouse). The database instance that provides decision support functionality is often logically separate from the transactional database due to the use of potentially different database schema such as star schema. In addition, the database instance that provides decision support functionality is also located in a physically separate server (or server cluster) to avoid interfering with the performance of the transactional (operational) server.
Running decision support functions on a physically separate decision support server does prevent impacting the performance of the operational server. A primary drawback of this solution is the additional expense to purchase, configure, and maintain a decision support server. In addition, there is also overhead involved in sending (or synchronizing) data from the operational server to the decision support server (which also involves ETL—extract, transform, and load operation). The separation between the operational server (or ODS) from the decision support server also prevents real-time decision support (also referred to as operational decision support). This is often required, for example, in product recommendation in e-commerce where the decision support needs to occur within a few user clicks on a web site.
Recently, there is substantial interest in server consolidation in which multiple logical instances of the server are running on the same physical server (or server cluster). This often involves the use of a virtualization layer to enable the mapping of multiple logical servers onto the same physical server either statically or dynamically. As an example, IBM Corporation (Armonk, N.Y.) provides a Hypervisor™ product that enables the partitioning through DLPAR (dynamic logical partitioning) of an IBM pSeries or iSeries server into multiple logical server partitions with each partition having its own processors, memory, and I/O adapter cards. Hewlett Packard (Palo Alto, Calif.) Integrity servers provide similar partition capability through logical (vPar) and physical (nPar) partitioning.
There are a number of commercially available load balancing engines, such as the IBM Tivoli Intelligent Orchestrator™ (TIO), that enable provisioning and scheduling capability to ensure the quality of service (QOS) of each computing task is achieved.
Running decision support (or business intelligence) operations on the same server that also runs the transactional server may interfere and slow down the transactional server on normal operational tasks. This is due to the sharing of central processing units (CPUs) and virtual memory as well as the access to databases. Load balancing on shared CPUs can be achieved through, for example, IBM Tivoli Intelligent Orchestrator™.
Interference due to simultaneous access to the same database instance, however, can only be eliminated through the careful partitioning of the database tables and more intelligent scheduling of the decision support related tasks. A known approach to address this problem is to schedule computation-intensive tasks and data-intensive tasks at previously known periods of low system load (e.g., in the middle of the night). The drawback of this approach is that the decision support analysis is not real-time and does not take into account the data that has arrived since the last time the decision support processing was run. If a significant change has taken place in the characteristics of the decision support data, there will be a delay until this change is reflected in the decision support analysis. In the intervening time, poor decisions (e.g., business decisions) could be made.
U.S. Patent Application No. 2004/0070606 A1, published Apr. 15, 2004, discloses a methodology for preprocessing market data and capturing the preprocessed data as association rules through the association rule analysis. The post-processing will involve the application of predetermined criteria to extract really useful rules. The method disclosed extracts rules in batch during the preprocessing stage and does not reflect the market situations in real time until the market data is processed again.
U.S. Patent Application No. 2003/0191678 A1, published Oct. 9, 2003, discloses a method that consists of a data management system, an assignment system, and a disruption handling system to facilitate dynamic changing of the data while solving a solution. In particular, the method disclosed translates domain specific data into application specific data, and stores preprocessed data in conjunction with the real time data for decision making. However, decision making operations are done in batch.
U.S. Patent Application No. 2003/0144746 A1, published Jul. 31, 2003, discloses a method that preprocesses raw data, communicates it to the process manager, which in turn may access processes to analyze and characterize the data received.
U.S. Patent Application No. 2003/0123446 A1, published Jul. 3, 2003, discloses a method that preprocesses the data locally before sending to a centralized location. The intent of the method is to minimize communication bandwidth needed for transmitting data to a centralized location.
U.S. Patent Application No. 2002/0077726 A1, published Jun. 20, 2002, discloses a method that includes an input processor time stamping the input data, forwarding to a preprocessor for preprocessing, then storing in a raw data buffer. Additional load balancing measure based on the pre-assigned priority is also disclosed. The method addresses decomposition of newly arrived transactions into a set of smaller decisions and the potential consideration for load balancing.
U.S. Pat. No. 6,321,263 B1, issued Nov. 20, 2001, discloses a method that preprocesses raw data and stores it in a table before being provided to a central repository. At predefined intervals, the data in the raw data tables is accessed and the information is processed to generate the defined sets of statistical data which is then inserted into statistics tables.
Accordingly, improved decision support techniques are needed.