The disclosure of this patent document contains material which is subject to copyright protection. The owner thereof has no objection to facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the U.S. Patent and Trademark Office patent file or records, but otherwise reserves all copyright whatsoever.
This invention relates to digital data processing and, more particularly, to methods and apparatus for database management systems on multiprocessor digital data processing systems.
In addition to performing calculations, computers have traditionally been used to store and retrieve large amounts of data. Early computer systems were typically programmed for this on an ad hoc basis. For example, to track a company""s employees, a program was typically written to handle all steps necessary to input, sort and store employee data in a computer file and, as necessary, to retrieve and collate it to generate reports. Special-purpose software packages, referred to as database management systems (or xe2x80x9cDBMS""sxe2x80x9d), were later developed to handle all but the highest-level of these tasks.
Among the most widely used database management systems are the so-called relational systems. From an operator""s perspective, these store data in two-dimensional tables. For example, each row (or record) of an employee data table might include the following columns (or fields) of information: name of an employee, his or her identification number, address, and department number.
One or more indexes on large tables are generally provided to facilitate the most common data accesses, e.g., look-ups based on employee name.
In relational systems, corresponding rows in two or more tables are identified by matching data values in one or more columns. For example, the department name corresponding to a given employee may be identified by matching his or her department number to a row in a department data table that gives department numbers and department names. This is in contrast to hierarchical, network, and other DBMS""s that use pointers instead of data values to indicate corresponding rows when tables are combined, or xe2x80x9cjoined.xe2x80x9d
Relational DBMS""s typically permit the operator to access information in the database via a query. This is a command that specifies which data fields (columns) are to be retrieved from a database table and which records (rows) those fields are to be selected from. For example, a query for the names of all employees in department 10 might be fashioned as follows:
SELECT name, department_number
FROM employee
WHERE department_number=10
There is no particular ordering of the resulting rows retrieved by the DBMS, unless the query specifies an ordering (e.g., ORDER BY name).
A query may also involve multiple tables. For example, to retrieve department names instead of numbers, the above query might be refashioned as follows:
SELECT name, department_name
FROM employee, department
WHERE department_number=10
AND employee.department_number=department.department_number
A particular relational data table need not be stored in a single computer file but, rather, can be partitioned among many files. This makes such tables particularly suited for use on multiprocessor computer systems, i.e., computer systems having multiple processors and multiple disk drives (or other storage devices) of the type disclosed in U.S. Pat. No. 5,055,999. Unfortunately, prior art DBMS""s have not proven capable of taking full advantage of the power of such multiprocessing systems and, particularly, their power to simultaneously process data (in parallel) from multiple partitions on multiple storage devices with multiple central processing units.
In view of the foregoing, an object of the invention is to provide improved methods and apparatus for database management and, particularly, improved methods and apparatus for data base management capable of operating on multiprocessor systems.
A further object of the invention is to provide improved systems for database management capable of effectively accessing a relational database contained in multiple tables and multiple partitions.
A still further object is to provide improved methods and apparatus for storing and retrieving data for access by a DBMS.
These and other objects are evident in the attached drawings and the description which follows.
The foregoing and other objects are attained by the invention which provides, in one aspect, improvements to digital data processors of the type having a database management system (DBMS) that accesses data records stored in a database table contained among plural independently accessible partitions (e.g., data partitions contained on separate disk drives), where that DBMS has a standard interface for processing queries to access those data records.
The improvement is characterized by a parallel interface that intercepts selected queries prior to substantive processing by the standard interface. The standard interface is often called the xe2x80x9cserverxe2x80x9d interface; it is accessed by clients that are the source of queries. A decomposition element within the parallel interface generates multiple subqueries from the intercepted query. Those subqueries, each representing a request for access to data stored in a respective partition of the table, are applied in parallel to the standard interface in lieu of the intercepted query. Responses by the DBMS to the subqueries are reassembled to generate a final response representing the response the DBMS would have generated to the intercepted query signal itself. Such reassembly can include interleaving the data contained in the responses (e.g., to create a single sorted list) or applying an aggregate function (e.g., sum or average) to that data.
According to a further aspect of the invention, the decomposition element generates the subqueries to be substantially identical to the intercepted signal but including an xe2x80x9cintersecting predicatexe2x80x9d (i.e., additional query conditions) that evaluates true for all data records in respective partitions of said database table and false for all others. This can be, for example, a logically AND""ed condition that evaluates true for records in the respective partition. Continuing the first example above, assuming that the employee database is partitioned randomly across multiple partitions, a subquery for the first partition could be generated as follows (where rowid has three parts, the last of which indicates the partition number):
SELECT name, department_number
FROM employee
WHERE department_number=10 AND
employee.rowid greater than =0.0.1 AND
employee.rowid less than 0.0.2
In another aspect, the invention contemplates a further improvement to a digital data processing system of the type described above, wherein the DBMS responds to selected queries for accessing data records joined from one or more of database tables, and wherein the DBMS includes an optimizer for determining an optimal strategy for applying such queries to the tables. The improvement of this aspect is characterized by an element for identifying, from output of the optimizer, a driving table whose partitions will be targeted by subqueries generated in responding to an intercepted query. The improvement is further characterized by generating the subqueries to include, in addition to the predicate list of the intercepted query, an intersecting predicate for all data records in respective partitions of the driving database table. Those skilled in the art will appreciate that tables referenced in the query other than the driving table need not be identically partitioned to the driving table, nor co-located with its partitions on storage devices. Tables may be accessed through either full-table scans or indexed scans, i.e., whether the DBMS searches all blocks of the relevant partition or only those indicated by a relevant index.
According to another aspect, the invention provides an improvement to a digital data processing system of the type described, wherein the DBMS""s standard interface is invoked by a procedure or function call. The improvement is characterized by functionality for invoking the parallel interface in lieu of the client-side portion of the standard interface in response to such a procedure/function call. And, by responding to a query for generating plural subqueries in the form of further procedures/functions to the standard server interface. The parallel interface can form part of an object code library for linking with a computer program including procedures/function calls for invoking the DBMS.
In still another aspect, the invention contemplates an improvement to a digital data processing system as described above, wherein the standard interface normally responds to insert/select queries by placing requested data from the database table means in a further database table (i.e., as opposed to merely printing the requested data or otherwise outputting it in text form or merely returning the data to the requesting program). The improvement of this aspect is characterized by generating the plural subqueries so as to cause the DBMS to place the data requested from each respective partition in the designated database table.
In yet another aspect of the invention, a digital data processing system as described above can include functionality for executing multiple threads, or xe2x80x9clightweight processes,xe2x80x9d each for applying a respective subquery signal to the DBMS""s interface element. Those threads can be executed in parallel on multiple central processing units, and can be serviced by multiple server processes within the DBMS that also execute in parallel.
Further aspects of the invention provide improvements to a digital data processing system of the type having a storage element (e.g., a disk drive or other random-access media) for storing and retrieving data records, as well as a DBMS having (i) a hashing element to effect storage of data records in xe2x80x9chash bucketxe2x80x9d regions in the storage element, where each record is stored in a root hash bucket region corresponding to a hash function of a selected value of the data record or, alternatively, to effect storage of data records in an overflow hash bucket region associated with that root hash bucket region; and (2) an indexing element to index each stored data record for direct access in accord with a respective value of that data record.
The improvement is characterized by a scatter cluster retrieval element that responds to a request for accessing a data record previously stored via the hashing element, by invoking the indexing element to retrieve that record in accord with the index value thereof, where stored records have previously been indexed by the indexing element with respect to the same fields (columns) used by the hashing element. In a related aspect of the invention, the hashing element stores the data records in hash bucket regions that are sized so as to create at least one overflow hash bucket region per root bucket region, and such that overflow bucket regions for a given root bucket region are distributed roughly evenly across different storage partitions.
Another aspect of the invention provides a digital data processing system of the type described above, in which plural subcursor buffers are associated with each subquery signal for storing results generated by the DBMS""s standard interface means in response to that subquery signal. To assemble all results of those subqueries, a root buffer stores a then-current result, while a fetching element simultaneously assembles a final result signal based upon those results currently stored in selected subcursor buffers. As results are taken from each of those buffers, they are emptied. For each such emptied buffer, a subquery is applied to the standard interface asynchronously with respect to demand for that buffer""s contents in assembling the final result. In the case of queries involving aggregates, the root buffer stores then-current results in a temporary table to be queried later by an aggregate query generated by the decomposition element.
In still other aspects, the invention provides a method for digital data processing paralleling the operation of the digital data processing system described above; i.e., xe2x80x9ctransparentxe2x80x9d to the DBMS client other than by improved performance.