The invention relates to computer systems, and more particularly to methods for inserting data into a data storage system.
The invention relates to inserting data items into a data storage object, such as inserting rows into a database table. One approach is serial insertion of rows, where the rows are inserted into a table one at a time, by a single data insertion unit, such as a database server or process (xe2x80x9cserverxe2x80x9d). In this approach, a query is executed that selects the rows of data to be inserted into the target table and creates a result set for that query. The server then proceeds through the query result set, one row at a time. The server first selects a row from the result set. The server then searches the target table for an area of free space within the table large enough to store the row. Finally, the server writes the source row to the target table at the point where the server located sufficient free space. The server repeats these steps with every row in the result set. Since this method searches for and uses free space within the bounds of the existing data in the target object, it makes efficient use of the free space within the target object. However, since the server has to search the target database object every time it tries to insert another row, and since only one row can be inserted at a time, this method is quite slow.
Another approach for inserting rows of data into a data storage object, such as a table in a database, is a parallel insertion of rows into the data storage object. In this approach, a query is executed that selects the rows to be inserted into the target object and a result set is generated. Multiple insertion slave units then work on this result set to insert rows into the target object. From this point, the approach branches into two types of parallel inserts.
Referring to FIG. 1A, the first type of parallel insert, used where a data storage object 101 has been partitioned into partitions 116, 118, is an insert above a xe2x80x9chigh water markxe2x80x9d. The high water marks 120, 122 are the points in the partitions 116, 118 above which no data has been stored. The high water marks 120, 122 divide the partitions 116, 118 into formatted spaces 121, 123 and unformatted spaces 117, 119. The formatted spaces 121, 123, which may or may not contain valid data, cannot be written to using parallel insertion methods. The unformatted spaces 117, 119 do not contain valid data, and can be written to using parallel insertion methods. During parallel execution of an insert into a partitioned data storage object 101, slave sets act on the data to be inserted. A query slave set 132 is a set of processes that execute the query and build the result set. A data insertion slave set 134 is a set of processes that receive the rows of data and insert them into the unformatted spaces 117, 119 of the partitions 116, 118 of the data storage object 101, above the high water mark pointers 120, 122. The query slave set 132 has query slaves 102, 104 that read the data to be inserted. The data insertion slave set 134 has data insertion slaves 108, 110 that insert the data into the partitions 116, 118.
A first data insertion slave 108 works on a first partition 116, and a second data insertion slave 110 works on a second partition 118. In the example of FIG. 1A, each data insertion slave 108, 110 works on only one partition 116, 118, and each partition 116, 118 has only one data insertion slave 108, 110 working on it. Alternatively, each data insertion slave can work on multiple partitions, as long as each partition is only worked on by one data insertion slave.
The rows read by the query slaves 102, 104 are typically processed in a random order. These rows, therefore, have to go through a repartitioning phase 106 where they are repartitioned and redistributed amongst the data insertion slaves 108, 110, such that the first data insertion slave 108 gets the rows of data to be inserted into the first partition 116 and the second data insertion slave 110 gets the rows of data to be inserted into the second partition 118. These rows are grouped into inserts 112, 114, which are inserted into the unformatted spaces 117, 119, above the high water marks 120, 122 of the corresponding partitions 116, 118 by the data insertion slaves 108, 110.
This first type of parallel insert is faster than the serial insert described above. The first type of parallel insert also avoids a write conflict, where multiple processes attempt to write data to the same location in the partition 116, 118 at the same time. However, if there is a data skew amongst the partitions 116, 118 then most of the data is being inserted into only a few partitions. This causes only one or a few slaves to actively do the insert, thus defeating the purpose of parallelism in inserts. Most of the data insertion slaves 108, 110 sit idle, which wastes system resources.
For example, in many applications a data storage object, such as a database table, is partitioned on a date column, with each partition containing a specified range of dates, such as a calendar year or a quarter. In these types of applications the rows are mainly inserted in the last (most recent dates) partition. Due to this, the data insertion slave working on the last partition needs to do much more work than other data insertion slaves.
Referring now to FIG. 1B, the second type of parallel insert is used for a non-partitioned data storage object 128, such as a non-partitioned database table or single table fragment (a partition of a simple partitioned table or a sub-partition of a composite partitioned table). In this case a separate data insertion slave set is not needed; the query slave set 132 is responsible for both reading the data rows and doing the inserts. Each query slave 102, 104 is allocated a temporary storage area 124, 126 of memory and inserts all of the rows of data the query slave 102, 104 has read from the data source into its temporary storage area 124, 126. When the query has been completed, the temporary storage areas 124, 126 are merged into the non-partitioned data storage object 128.
This second type of parallel insert allows multiple query slaves 102, 104 to work on a single non-partitioned data storage object 128. Furthermore, the method avoids write conflicts, since each query slave 102, 104 only works on one temporary storage area 124, 126. During merging of the temporary storage areas 124, 126, each temporary storage area 124, 126 has an unused space 130, 131. This unused space 130, 131 is wasted when the temporary storage areas 124, 126 are merged into the non-partitioned data storage object 128. This wastage is not very significant for a single non-partitioned data storage object, but the wastage becomes very large if an extension of this technique is used for inserting data into a partitioned object. Thus methods are needed that allow multiple query slaves 102, 104 to work on a single partition 116, 118 of a partitioned data storage object 101, while avoiding write conflicts and minimizing space wastage.
The present invention is directed to methods for avoiding write conflicts and minimizing space wastage during parallel inserts of data into partitioned data storage objects.
In an aspect of an embodiment of the invention, temporary storage areas are used in conjunction with partitioned data storage objects, allowing multiple slaves to insert data into a single partition of the partitioned data storage object.
In an aspect of a preferred embodiment of the invention, the space above the high water mark becomes a brokered resource, allowing multiple slaves to insert data into a single partition of the data storage object.
In another aspect of a preferred embodiment of the invention, the size of the blocks of data to be inserted is dynamically determined.
In another aspect of a preferred embodiment of the invention, the block of data to be inserted is patched with information to allow direct writing of the block from short-term memory to a long-term storage device.