1. Field
Embodiments of the invention relate to reduction of join operations when archiving related database tables.
2. Description of the Related Art
Relational DataBase Management System (RDBMS) software uses relational techniques for storing and retrieving data in a relational database. Relational databases are computerized information storage and retrieval systems. Relational databases are organized into tables that consist of rows and columns of data. The rows may be called tuples or records or rows. A database typically has many tables, and each table typically has multiple records and multiple columns. A RDBMS may use a Structured Query Language (SQL) interface.
A table in a database may be accessed using an index. An index is an ordered set of references (e.g., pointers) to the records in the table. The index is used to access each record in the table using a key (i.e., one of the fields or attributes of the record, which corresponds to a column). The term “key” may also be referred to as “index key”. Without an index, finding a record requires a scan (e.g., linearly) of an entire table. Indexes provide an alternate technique to accessing data in a table. Users may create indexes on a table after the table is built. An index is based on one or more columns of the table.
A query may be described as a request for information from a database based on specific conditions. A query typically includes one or more predicates. A predicate may be described as an element of a search condition that expresses or implies a comparison operation (e.g., A=3). When a query is received by the RDBMS, the RDBMS executes the query.
A join may be described as an SQL relational operation that allows retrieval of data from two or more tables based on matching column values.
Tables may be archived to reduce the size of the tables by removing data that is old and/or not being accessed anymore. Reduction in size of the tables leads to better overall application performance for applications accessing the tables. Thus, archiving may be described as moving certain data from “original” tables to “archive” tables, so that the data no longer resides in the original tables, but is still available if needed in the archive tables. The typical solution to archiving multiple related tables is, for each table, joining the table being archived to all of the parent tables of that table to ensure the correct rows are being qualified for archival. A parent table may be described as a table that is related to another table (also referred to as a child table) by referential integrity (RI). Referential integrity may be described as a condition that exists when intended references from data in one column of a table to data in another column of the same or a different table are valid. Referential integrity may be enforced by DataBase Management System (DBMS) constraints (System referential integrity) or programmatically within a DBMS application (Application referential integrity). Note that a table may be a parent of one or more tables (i.e., a parent may have multiple children) and a child of one or more tables (i.e., a child may have multiple parents). Unfortunately, this solution requires multi-table database joins. Database joins are very expensive from a resource usage perspective and from a performance perspective.
In conventional systems, when archiving a set of related tables (also referred to as an “archive unit”), an archive unit tree is walked from bottom to top or top to bottom. That is, the archive unit is the set of related tables to be archived, and an archive unit tree may be described as a representation of the related tables in an archive unit. The root table of an archive unit tree is a starting point table. The purpose of the archive unit is to ensure that data being archived is referentially intact. That is, the data being archived is directly or indirectly related to the rows qualified in the starting point table. Rows are qualified based on a predicate provided with a query (also referred to as a statement) requesting the archive. FIG. 1 illustrates an archive unit tree. In FIG. 1, an EMP table 110 is a root table with a WORKDEPT column 112. DEPT table 120 is a child table of the root table, EMP table 110, and is a parent table of the PROJ table 130. The DEPT table 120 includes a DEPTNO column 122. The PROJ table 130 includes a DEPTNO column 132. The EMP table 110 has an associated ARCEMP archive table 114. An archive table (also referred to as an archive target table) may be described as a table that is populated with the rows that are qualified to be archived. The DEPT table 120 has an ARCDEPT archive table 124. The PROJ table 130 has an archive table 134. A connection key may be described as a column in a table that is to be compared against a connection key column in a parent or child table. Conceptually, connection keys are similar to primary and foreign key keys used to relate tables.
As a result of walking the archive tree unit from bottom to top or top to bottom, when tables are archived within the archive tree unit, a JOIN operation is performed between the table to be archived and the parent tables of that table up to the root table in order to qualify the rows to be archived. For example, in FIG. 1, when archiving the PROJ table 130, the PROJ table 130 is joined to the DEPT table 120 through the connection keys DEPTNO 122-DEPTNO 132. Then the DEPT table 120 is joined to the EMP table 110 via the connection keys WORKDEPT 112-DEPTNO 122. The multi-table join processing may be very expensive when there a several levels (i.e., many parent tables between a child table to be archived and the root table) in a branch of the archive unit tree.
The following Statements (1 ), (2 ), and (3) illustrate the processing to archive the PROJ table 130.
Statement (1 ) is a sample statement that selects rows from the EMP table 110 for which the WORKDEPT column 112 has a value of D11.
Statement (1)SELECT A.“EMPNO” FROM “DSN8810”.“EMP” AWHERE (WORKDEPT = ‘D11’ )
Statement (2 ) is a sample statement that selects a DEPTNO column 122 value from the DEPT table 120 for each row for which the row is in the result set of Statement (1 ), and the DEPTNO column 122 value equals the WORKDEPT column 112 value. Statement (2 ) requires a join between the EMP table 110 and the DEPT table 120.
Statement (2)SELECT A.“DEPTNO” FROM “DSN8810”.“DEPT” AWHERE EXISTS (SELECT 1 FROM “DSN8810”.“DEPT” B,“DSN8810”.“EMP”WHERE (WORKDEPT = ‘D11’)AND B.“DEPTNO” = “DSN8810”.“EMP”.“WORKDEPT”AND A.“DEPTNO” = B.“DEPTNO”)
Statement (3 ) is a sample statement that selects a PROJNO column value (not shown in FIG. 1) and a DEPTNO column 132 value from the PROJ table 130 for each row for which the row is in the result set of Statement (2 ), the DEPTNO column 132 value equals the DEPTNO column 122 value, the DEPTNO c WORKDEPT column 112 value. Statement (3 ) requires a join between the PROJ table 130 and the DEPT table 120, and a join between the DEPT table 120 and the EMP table 110.
Statement (3)SELECT A.“PROJNO”, A.“DEPTNO” FROM “DSN8810”.“PROJ” AWHERE EXISTS (SELECT 1 FROM “DSN8810”.“PROJ” B,“DSN8810”.“DEPT”,“DSN8810”.“EMP”WHERE (WORKDEPT = ‘D11’)AND B.“DEPTNO” = “DSN8810”.“DEPT”.“DEPTNO”AND “DSN8810”.“DEPT”.“DEPTNO” =“DSN8810”.“EMP”.“WORKDEPT”AND A.“DEPTNO” = B.“DEPTNO”)
Thus, conventional solutions may require many expensive joins to archive a table, and, therefore, there is a need in the art for improved archiving of tables.