Due to the innumerable active systems running in a enterprise environment, data movements such as exporting and importing operations became very frequent. The frequency of these operations tends to be constant but the amount of data involved tends to grow over time. Therefore, being able to import data to a database in the least time possible is often a very important matter for many IT departments; it could mean a better availability for a mission critical application.
There are many different options for bulk data loading into a database. In a Microsoft® SQL Server® computer environment available from Microsoft Corporation in Redmond, Wash., one could use BCP command line, a T-SQL bulk insert, the OLE database products DB IRowsetFastLoad interface or other variations known in the art. A common attribute of all these options is that they handle only a single table operation. The bottleneck in the loading operation is that the database management functions are engulfed in tasks to keep the database receiving the data. Usually the client side is reasonably fast because its tasks are reading and converting the data from the input file and transmitting the data to the server. Once the data is in the server, being appended to the database table, the client side may process another chunk of data, but it will not submit it until it receives an answer about the first data submitted.
FIG. 1 is a depiction of a prior art system 100 for transferring data into a database. The input data 102 can be in any electronic form such as a spreadsheet, comma separated value, text or a markup language such as XML. The data format is assumed compatible with a data shredder 105 which accepts the data 102 and converts it to rows of data. The data shredder 105 operates on the principle that a basic structure in the data 102 can be utilized or derived such that the data may be formatted into rows suitable for table insertion. For example, the input data 102 may be XML data and the data shredder 105 may use the explicit structure of XML to develop a dependency graph upon which the XML data can be segregated into tables. Typically, any one table may have a dependency on another table such that one table exhibits a parent relation to another table which exhibits a child relationship. Some other tables may exhibit no dependency on another table and may therefore be either independent or may represent a different dependency.
The data shredder 105 can produce row data in a format that allows a cache 110 to form distinct tables representing the row data. For example Table A, 115, may represent a parent table from row data received for the data shredder 105. Table B, 120 may represent child table data related to Table A. Table C, 125 may represent row data from the data shredder 105 that is a child of Table B. Since the row data generally is provided in a hierarchal form with parent data first, Table A, being a parent data table may be filled first followed by Tables B and C respectively. Once loaded, these tables are in a form for transfer to the database 130.
Prior art transfers from the cache organizer 110 to a database 130 are generally performed in a serial manner. Table A would be transferred first followed by Table B and then Table C in order to satisfy the need for the database to generate appropriate indexes and other hierarchical structures for the database. The process is also typically serial because only one processor typically handles the transfer of the Tables and the tables are filled completely and then transferred over to the data base in serial generation order.
It is well known that the bandwidth of a serial interface 127 from the cache 110 to the database 130 can be greater than the processing speed of the cache organizer 110 itself. Consequently, the database 130 could, in theory, be loaded much faster if the cache organizer 110 could more quickly produce data for the database. However, the single processor serial transfer mechanism of the prior art limits the speed at which tables destined for the data are made available. Even in multiprocessor environments, it is typically the responsibility of a single processor to manage the transfer of tabular data to a database.
Thus, there is a need for a technique which can perform a time and resource efficient transfer of tabular data to a target database. In particular, it would be useful if the technique could make use of a multiprocessor environment. The present invention addresses the aforementioned needs and solves them with additional advantages as expressed herein.