1. Field of the Invention
This invention relates to database partitioning, and more particularly, to the partitioning of tables (also relations) in a relational database, or data objects in a distributed database.
2. Description of the Related Art
Several different schemes exist for partitioning a database table across several different systems. The schemes are chosen to enable parallel evaluation of a database query on a plurality of systems. One scheme uses a primary key to horizontally partition tuples of a table across different systems. Examples include partitioning based on a randomizing hash function of the primary key, or based on key value ranges. A database table may also be partitioned based on information not stored in a tuple (non-key-based), such as the site where the tuple was inserted, the user who inserted the tuple, or the application used to insert the tuple.
These schemes may be used for object-based databases or relational databases. The related art and present invention will be explained in terms of relational database systems. Relational database systems include objects called relations. Typically, a relation is represented (or embodied) in a table having rows and columns. The rows of a table are tuples of the represented relation; the table's columns are the relation's attributes. Relational database systems provide operations such as RESTRICT (or SELECT), JOIN, and PROJECT. Each of these operations yields a new relation. For example, RESTRICT extracts one or more tuples from a relation to yield a new relation. PROJECT extracts one or more attributes from a relation to yield a new relation. Finally, JOIN combines tuples from two relations based on one or more attributes common to both relations to yield a new relation.
The primary key of a relation is an attribute or a set of attributes, having data values unique for each tuple of the relation. As noted above, the primary key is used by schemes to partition a relation across several systems. In detail, different groups of tuples are stored on different systems, and the systems are networked together based on the primary key of the tuples. Each system has a database management system (DBMS) that handles queries for the system. Ideally, a user accessing a database on any of the systems should not be aware of the partitioning scheme employed other than by way of system performance. For example, a user may submit a query requiring access to all the tuples of a relation where the relation is partitioned across multiple systems. The DBMS of the user's system would request copies of the tuples of the relation stored on other systems. The time required to execute and complete the user's request will be a function of the capacity/usage of the other systems and the network interlinking the systems. Meanwhile, users on other systems could be making similar requests for tuples of the relation, further delaying the execution of queries.
One goal of data partitioning is to reduce the time required to process a query. If the tuples are suitably partitioned, parallel execution is possible without too much overhead. Parallel execution is impeded by queries that require access to multiple relations, such as, JOIN queries. An example of the problem of partitioning and parallel execution of database queries is presented with reference to the three tables shown in FIGS. 1, 2, and 3 and a network system shown in FIG. 4.
The tables illustrated in FIGS. 1, 2, and 3 represent CITY, EMPLOYEE and JOB relations. (These relations will now be referred to simply by their proper names). The primary keys for the relations are the city code, employee no., and job no. attributes, respectively. FIG. 4 is a simplified diagram of a networked database system 99 whose parts are joined for cooperative operation by a network 100. The system 99 includes three separate, interlinked computer sites 10, 20, and 30 located, for example, in New York, Chicago, and Los Angeles. Each computer site 10, 20, and 30 has a database management system ("DBMS") 14, 24, and 34, and online data storage 12, 22, and 32 for local database storage. If data partitioning by primary key is employed for the three relations, with equal distribution of records, CITY would be partitioned based on its primary key, the city code attribute, EMPLOYEE would be partitioned based on its primary key, the employee no. attribute, and JOB would be partitioned based on its primary key, the job no. attribute. An example of local tuple storage based on partitioning by primary key for CITY, EMPLOYEE, and JOB in the system 99 is shown in FIG. 5.
In FIG. 5, EMPLOYEE has been partitioned and locally stored in data storage 12, 22, and 32, using the employee no., with equal distribution. JOB has also been partitioned and locally stored in data storage 12, 22, and 32, using the job no., with equal distribution. It will be shown that this type of partition can cause very inefficient processing to compute JOIN. For example, users in New York, Chicago, and Los Angeles may each want to generate a new relation having all tuples for jobs being performed by employees located in their city. Execution of the sequence of steps necessary to complete the query for users in New York is presented.
First, the DBMS 14 New York generates a query sent to DBMS's 24 and 34 to evaluate each tuple of JOB that is locally stored, i.e., in data storage 22 and 32. The DBMS's 24 and 34 must send copies of those tuples of JOB where the job is being performed by an employee working in New York. In addition, the DBMS 14 must evaluate the tuples of JOB stored locally for jobs being performed by an employee working in New York. There are two tuples of JOB stored locally in data storage 12. The evaluation of the tuple for job no. 02 (job name green) is presented. JOB does not contain the city code, it only has the foreign key employee no. So, the DBMS 14 first determines that employee no. 03 is performing job no. 02 (the employee no. attribute has a value of 03). The DBMS 14 then evaluates the tuples locally stored for EMPLOYEE having the attribute employee no.=03. The tuple is not stored locally. So the DBMS 14 sends a query to the other DBMS's 24 and 34 for the tuple of EMPLOYEE for employee no. 03. The DBMS 24 will receive the request and when it has access to the tuples for employee relation stored locally, it will send a copy of the tuple for employee no. 03 over the network to the DBMS 14. The DBMS 14 will then evaluate the copy of the tuple from EMPLOYEE for employee no. 03 and determine that the city code attribute has a value of 02.
The DBMS 14 will receive the tuple for employee 03 indicating that the employee works in city code 02 (attribute city code has a value of 02). The DBMS 14 then may need to determine which city corresponds to city code 02. The tuple of CITY having the city code attribute value of 02 is also not stored locally. So the DBMS 14 may send another query request to DBMS 24 and 34 for the tuple of CITY having city code=02. The DBMS 24 will send a copy of the tuple from CITY with city code=02. Then the DBMS 14 can compare the value of the city name attribute in the CITY tuple sent from DBMS 24 to "New York" and finally determine that the tuple from JOB having the job no. attribute with a value of 02 is not being performed by an employee working in New York. This same procedure would be performed for the other tuple of JOB that is stored locally (job no.=01). At the same time, DBMS 24 and 34 perform the same procedure for each of their locally stored JOB tuples to determine if the job is performed by an employee working in New York, and send the result to DBMS 14. Finally DBMS 14 combines these results to produce an answer to the original query.
Note that the other DBMS's may not respond instantly to the queries received from DBMS 14, since they may be processing queries for local users. As can be seen, this is not an efficient use of bandwidth or resources since records are being sent between systems unnecessarily instead of being evaluated locally by each DBMS, due to the data partitioning scheme. One method to reduce such inefficiency is to co-locate the tuples of two tables that are to be JOINed by partitioning both tables on the joining attribute. For example, both the CITY and EMPLOYEE tables are partitioned on the city code attribute. In this way, a DBMS can determine locally for each local EMPLOYEE tuple the city in which the employee works. However, it then would not be possible to co-locate the JOB table with the EMPLOYEE table by partitioning them on the employee no. attribute since the EMPLOYEE table is already partitioned on the city code attribute.
Another solution proposed in the past was to add more attributes (or columns) to lower level tables (thereby flattening the database). For example, the city code attribute could be added to JOB, extending the number of attributes (or columns) of relation from four to five with information that is redundant with information in the other two relations. This technique has numerous drawbacks. First, the insertion process is made more cumbersome since a tuple from EMPLOYEE must be received to determine the city code attribute for the tuple to be inserted into JOB. Second, this flattens the database by including redundant information in a relation, which consumes storage and creates risk of inconsistent data. Third, it makes the maintenance of the relations more cumbersome since JOB needs to be updated each time the employee relation is updated, e.g., if the city code attribute is modified for a tuple in EMPLOYEE, all tuples having an employee no. attribute equal to that of the updated employee tuple would need their city attribute updated. Thus, a more suitable partitioning system or method is needed to enable parallel execution of database queries for common or logical queries based on how relations interlink.