The present invention relates to method and apparatus for improving retrieval of data from a database. More particularly, the present invention facilitates retrieval of data from the database by providing an apparatus and method for implementing a value based storage structure which may be used to store an entire database or an index of data which is stored in a conventional database.
Many types of database management systems are known. One such management system is a Relational Database Management System (RDBMS) in which data is serially stored in a computer memory and secondary storage (disk and tape). It is known to provide an index of data stored in a database in a separate index file stored in the computer memory secondary storage (disk and tape). The index is a separate file which stores selected information and provides a map for obtaining that information from the main database. In a conventional RDBMS, the index is serially computed and stored like any other file.
Conceptual data structures have been proposed for storing data in a value based storage arrangement. Value based storage typically includes domain cells, value cells, relation cells, and tuple identification cells which are related to the domain cell. Value based storage systems cannot be implemented efficiently on the serial system of a conventional RDBMS. Value based storage systems require unbounded fanout from each domain cell, each value cell, and each relation cell. The value based data structure implemented on a serial system must use explicit pointers.
The value based data structure cannot be implemented efficiently on conventional computer systems because the data structure has a fixed depth of the tree of four, and thus must allow unbounded fanout from the domain cells, value cells, and relation cells. Unbounded fanout cannot be implemented efficiently on a sequential machine, and must be modeled through the use of B+ trees.
An object of the present invention is to provide an apparatus and method which enables a value based indexing system to be used with a relational database management system to facilitate retrieval of data from a database.
Another object of the present invention is to provide an apparatus and method which permits unbounded fanout of data stored in a database.
According to one aspect of the present invention, an indexing system is provided for improving retrieval of data from a database management system based on a query from a user. The database management system includes a main computer and means coupled to the main computer for storing the data. The indexing system includes a parallel computer coupled to the main computer of the database management system, and means for storing a value based index of selected attributes related to the data stored in the storing means of the database management system. The storing means is coupled to the parallel computer. The indexing system also includes means for determining whether the query can be executed at least partially on the parallel computer, and means for executing the query on the parallel computer to obtain at least a partial result to the query from the parallel computer using the value based index stored in the storing means.
In an illustrated embodiment, the value based index of selected attributes includes value cells and related data cells stored adjacent each value cell. The value based index is stored in the storing means by association with related data items without the use of explicit pointers to permit unbounded fanout of the value based index.
Also in an illustrated embodiment, the indexing system further includes means for mapping a location of a boundary condition for each value cell stored in the value based index storing means. The executing means includes means for distributing data stored in a boundary condition corresponding to a selected value cell of the query across the parallel computer, means for recognizing and marking a span of data cells related to the value cell of the query on the parallel computer, and means for selecting a data cell corresponding to the query from said span of data cells related to the value cell of the query.
In another illustrated embodiment, the related data cells include relation cells and tuple identification cells. A tuple identification list is stored in the storing means. Only a single tuple identification cell is located adjacent each relation cell. The tuple identification cell includes a starting offset value and a run length value corresponding to positions in the tuple identification list.
According to another aspect of the present invention, a method is provided for improving retrieval of data based on a query from a user from a database management system including a main computer, means coupled to the main computer for storing record based data, a parallel computer coupled to the main computer, and a parallel disk array coupled to the parallel computer. The method includes the step of storing a value based index of selected data attributes related to the data in the parallel disk array. The value based index includes value cells and related data cells stored adjacent each value cell. The method also includes the steps of mapping the location of a boundary condition for each value cell and the related cells for each value cell stored in the parallel disk array, selecting a boundary condition corresponding to a selected value cell of the query, and distributing data stored in the parallel disk array within the selected boundary condition to the parallel computer. The method further includes the steps of recognizing and marking a span of data cells related to the selected value cell corresponding to the query on the parallel computer, and selecting a data cell corresponding to the query from said span of data cells related to the selected value cell to produce at least a partial result to the query.
According to yet another aspect of the present invention, a method is provided for improving retrieval of data based on a query from a user from a database management system including a main computer, means coupled to the main computer for storing the data, a parallel computer coupled to the main computer, and a parallel disk array coupled to the parallel computer. The method includes the steps of storing record based data on the storing means of the database management system, storing a value based index of selected data attributes related to the record based data on the parallel disk array, determining whether the parallel computer can be used to execute the query, and sending the query to the database management system and executing the query on the database management system to produce a final result upon determining that the parallel computer cannot be used to execute the query. The method also includes the steps of sending the query to the parallel computer upon determining that the parallel computer can be used to execute the query, determining whether a final result to the query can be determined from the parallel computer, executing the query on the parallel computer, and sending a final result from the parallel computer to the user upon determining that the parallel computer can be used to determine the final result. The method further includes the steps of sending a partial result to the database management system upon determining that the parallel computer cannot be used to determine the final result, obtaining the final result on the database management system using the partial result received from the parallel computer, and sending the final result from the database management system to the user.
According to still another aspect of the present invention, a system for improving retrieval of data from a database based on a query from a user. The system includes a parallel computer including a plurality of processors electrically coupled to each other in parallel, and means for storing data organized in a value based data structure including value cells and related data cells stored adjacent each value cell. The storing means is coupled to the parallel computer. The system also includes means for mapping the location of a boundary condition including a particular value cell and all data cells related to the particular value cell for each value cell stored in the storing means, means for distributing data within the boundary condition corresponding to a selected value cell of the query across the parallel computer so that each data cell in the boundary condition is read into a separate processor of the parallel computer, and means for executing the query on the parallel computer to obtain at least a partial result to the query from the parallel computer using the value based data stored in the storing means.
In an illustrated embodiment, the executing means includes means for recognizing and marking a span of data cells which are related to the selected value cell on the parallel computer, and means for selecting a data cell corresponding to the query from said span of data cells related to the selected value cell to produce at least a partial result to the query.
Also in an illustrated embodiment, the related data cells include relation cells and tuple identification cells. The value based data structure is stored in the storing means by association with related data cells without the use of explicit pointers to permit unbounded fanout of the value based data structure.
Additional objects, features, and advantages of the invention will become apparent to those skilled in the art upon consideration of the following detailed description of a preferred embodiment exemplifying the best mode of carrying out the inventions presently perceived.