The IBM DB2® Universal Database (DB2 UDB) can automatically determine the most effective degree of query parallelism to use for query performance across SMP CPUs as a maintenance task. The DB2 UDB provides an ideal environment for maintaining parallelism in many processing operations. Operatively, as used herein, the term “parallelism” means the ability to execute a command statement, perform input/output (I/O), or run certain utilities such as backup, restore or load across multiple processors, for instance.
Parallelism of operations can prove beneficial in reducing time and expense to undertake complex computing activities. Recently, automatic parallelism selection is being commercialized where during execution, complex queries can benefit from parallel processing, while simple queries can bypass the overhead of the parallel processing infrastructure. Accordingly, the decision on the degree of parallelism can be made dynamically during execution.
Operatively though challenges can arise. For instance, in LOAD (defined as a DB2 UDB database level authority and privilege that can be granted at a database level.), agents or engine dispatchable units (EDUs) which perform tasks on behalf of the database manager or an application, handle different tasks in an effort to promote parallelism and thereby reduce time and expenses. Examples of such tasks may include the formatters and the ridders.
The formatters may be many in number requiring handling and each formatter is responsible for parsing raw data from an input source and converting it into an internal record format (IRF). The formatters then pass these TRFs or records to a single RIDder.
A RIDder is one in total and is responsible for allocating extents and assigning record identifiers (RIDs) to each IRF or record. In this process, parallelism is set to a value of (either by LOAD or by a user modifying the CPU_PARALLELISM option). In this case the RIDder process is performed by the formatter.
Furthermore, in this case there will be no db21rid process. There will just be a single db21frm0 process which handles both ridding and formatting. The “db21rid” process performs the functions: SMP synchronization; Allocate RIDs; build the indexes; and it also controls the synchronization of the LOAD formatter processes.
However, for XML LOAD, XML documents are often desired to be parsed in the formatters, where, during the parsing phase, index keys are also accumulated. In the situation where there are user defined XML indexes (also used herein as “values indexes” and referred to as “XML indexes” or “Value Indexes”), one referential parameter (used herein as a “keypart”) needed for accurate page building in a later step is a RID which references the original formatted IRF or record.
Unfortunately, the RID assignment is generated by a single process which is both different than that of the parsing and which may occur after the parsing, as the XML documents are often parsed in parallel ahead of time by separate processes. Since a RID has not been generated nor assigned, XML indexes remain incomplete. As the parsing involves inserting index keys from each XML document into a shared sort, the parsing is also incomplete as the index keys are incomplete and cannot be inserted into the shared sort, as they are directly related to the RIDs which have yet to be generated.
A possible approach is to reposition the level of processing to be coincident with that of the ridder. However, this option is not practical, in time or effort, as the ridder is limited in quantity, is highly dependent for its present functions and such a repositioning would then require the ridder to be directly engaged in the inserting aspects of a defined index key which would be known only once its respective RID was generated by the ridder.
Another possible approach is to provide an interim buffer of collected index keys. More particularly, when the XML document has been inserted and a RID has been assigned, the buffered collected index keys could be processed (ie. inserted into a sort or index) in relation to the respective and known RID. While this approach may be viable for situations lacking parallelism, such as for INSERT/index create, in a situation requiting parallelism, the resulting process and flow strain on the limited ridder resource would severely degrade the opportunity for parallelism (e.g., in LOAD).
As a result, parallelism is degraded, performances are limited and a system constraint on the generation of RID, the insertion of index keys and the timely creation of builds occurs. There is therefore a need for a method for sustaining database processing parallelism of one or more parallelized processes while overcoming the RID issues. The present invention addresses such a need.