The increase in enterprise software, data warehousing and other strategic data mining resources has increased the demands placed upon the information technology infrastructure of many companies, academic and government agencies, and other organizations. For instance, a retail corporation may capture daily sales data from all retail outlets in one or more regions, countries or on a world wide basis. The resulting very large data base (VLDB) assets may contain valuable indicators of economic, demographic and other trends.
However, databases and the analytic engines which interact with those databases may have different processing capabilities. For instance, a database itself, which may be contained within a set of hard disk, optical or other storage media connected to associated servers or mainframes, may contain a set of native processing functions which the database may perform. Commercially available database packages, such as Sybase™, Informix™, DB2™ or others may each contain a different set of base functions. Those functions might include, for instance, the standard deviation, mean, average, or other metric that may be calculated on the data or a subset of the data in the database. Conversely, the analytic engines which may communicate with and operate on databases or reports run on databases may contain a different, and typically larger or more sophisticated, set of processing functions and routines.
Thus, a conventional statistical packages such as the SPSS Inc. SPSS™ or Wolfram Research Mathematica™ platforms may contain hundreds or more of modules, routines, functions and other processing resources to perform advanced computations such as regression analyses, Bayesian analyses, neural net processing, linear optimizations, numerical solutions to differential equations or other techniques. However, when coupled to and operating on data from separate databases, particularly but not limited to large databases, the communication and sharing of the necessary or most efficient computations may not always be optimized between the engine and database.
For instance, most available databases may perform averages on sets of data. When running averages on data, it is typically most efficient to compute the average within the database, since this eliminates the need to transmit a quantity of data outside the database, compute the function and return the result. Moreover, in many instances the greatest amount of processing power may be available in the database and its associated server, mainframe or other resources, rather than in a remote client or other machine.
On the other hand, the analytic engine and the associated advanced functions provided by that engine may only be installed and available on a separate machine. The analytic engine may be capable of processing a superset of the functions of the database and in fact be able to compute all necessary calculations for a given report, but only at the cost of longer computation time and the need to pass data and results back and forth between the engine and database. An efficient design for shared computation is desirable. Other problems exist.