The present invention relates generally to data structure infrastructure, and in particular to the use of light programs and helper steps in summarizing data.
A common business need is to summarize data that exists in a system. This can be accomplished using a summarization program. A summarization program can involve any number of functional data transformations that bring data from an existing state in a system to a state that is ready or closer to being ready to be viewed by an end user, (e.g., in a report). A report may be any entity that displays summarized data to an end user. A further business need is to be able to summarize varying volumes of data, the volumes of which can differ to such an extent as to require different infrastructures to achieve a required level of performance for the summarization program. The different infrastructures can include a bulk infrastructure, which is suited to summarizing a sufficiently large volume of the data, and an incremental infrastructure, which is suited to summarizing a sufficiently small volume of the data. Various other infrastructures of summarization exist, but for the purposes of explaining the solution herein, they can be considered to fall into the bulk infrastructure or the incremental infrastructure category described above.
One general purpose of a bulk infrastructure is to allow the summarization of large volumes of data. The bulk infrastructure can have its own code path. For instance, for a given data transformation, there exists a portion of SQL code written exclusively to be run in the bulk mode, and another portion of SQL code written exclusively to be run in the incremental mode, and other portions of SQL code as needed for other modes. All of the portions of SQL code may be accomplishing similar functions (for example, including, but not limited to converting transaction currency into global currency), but the techniques involved in tuning SQL code to perform well for a large volume of data may differ from the techniques involved in tuning SQL code to perform well for a small volume of data. The data may have different code paths to follow depending on the mode of summarization. The bulk infrastructure can allow bulk operations that, among other restrictions, can rule out the use of partitioned tables. One general purpose of an incremental infrastructure is to allow summarization in parallel and on smaller volumes of data. For example, the incremental infrastructure can have its own code path and can allow the use of partitioned tables or session specific tables and incremental tuning techniques. Because summarizing data is a complex operation, the summarization infrastructure must allow for efficient tuning techniques, use of table-space, recovery techniques, and CPU usage. Some previous solutions such as data warehousing can fall into such a category. Such previous solutions do not provide for improved use of parallel processing or the ability to nimbly run different portions of the summarization flow.
In another previous solution, the summarization program spawned database processes to help process work. However, only one instance of a summarization program that required help could run at any one time, severely limiting concurrent capabilities. A second instance of the summarization program cannot receive help and must wait until the original program completely ends. This issue stems from the inability to share the help from the spawned processes.
Also, previously, a rigid summarization program structure may have been defined and custom code was developed for each particular step for performance purposes, rather than designing separate code paths. Accordingly, a nimble and scalable solution with a versatile summarization mechanism is desired. Therefore, an improved summarization approach is needed.