1. Field of the Invention
This invention relates in general to database management systems performed by computers, and in particular, to a method of optimally determining lossless join operations.
2. Description of Related Art
Computer systems incorporating Relational DataBase Management System (RDBMS) software using a Structured Query Language (SQL) interface are well known in the art. The SQL interface has evolved into a standard language for RDBMS software and has been adopted as such by both the American Nationals Standard Institute (ANSI) and the International Standards Organization (ISO).
In a data warehouse, referential integrity constraints are often created to maintain the data integrity across different tables. Referential integrity imposes a constraint between tables such that whenever a tuple is inserted into a child table, there will be exactly one matched row in the parent table. In other words, the join of parent and child tables does not reduce the number of rows of the child table that satisfy a WHERE condition. Therefore, the join is considered a lossless join.
It is well-known in research literature that the parent table can be eliminated from the query if its columns are not selected, and hence there is a need in the art for an optimizing method that eliminates parent tables where the join between the parent and child table is a lossless join. Specifically, there is a need in the art for identifying which joins are lossless and which tables are eligible for removal.
To overcome the limitations in the prior art described above, and to overcome other limitations that will become apparent upon reading and understanding the present specification, the present invention discloses a method, apparatus, and article of manufacture for optimizing a query being performed by a computer system to retrieve data from a database stored on the computer system. The query is analyzed to identify any joins therein that are lossless and to identify any tables of the identified joins that are eligible for removal. This analysis includes partitioning the joins into lossless and lossy joins, and partitioning the tables of the joins according to their associated quantifiers, wherein each of the quantifiers has a quantifier state indicating whether the table participates in a join that is lossless. The query is then rewritten to eliminate the identified tables that are eligible for removal.