1. Field of the Invention
This invention relates in general to database management systems performed by computers, and in particular, to the optimization of SQL queries in a relational database management system by propagating and exploiting column nullability.
2. Description of Related Art
Relational DataBase Management System (RDBMS) software using a Structured Query Language (SQL) interface is well known in the art. The SQL interface has evolved into a standard language for RDBMS software and has been adopted as such by both the American Nationals Standard Organization (ANSI) and the International Standards Organization (ISO).
In RDBMS software, all data is externally structured into tables. The SQL interface allows users to formulate relational operations on the tables either interactively, in batch files, or embedded in host languages such as C, COBOL, etc. Operators are provided in SQL that allow the user to manipulate the data, wherein each operator operates on either one or two tables and produces a new table as a result. The power of SQL lies on its ability to link information from multiple tables or views together to perform complex sets of procedures with a single statement.
According to the SQL standard (i.e., ISO-ANSI Working Draft: Database Language SQL2 and SQL3; X3H2; ISO/IEC JTC1/SC21/WG3, 1993), column nullability for various set operations is based on the two-valued logic, i.e., a column is either nullable or non-nullable. In the SQL standard, null values can be stored only in nullable columns. For example, the following CREATE statement describes an employee table in the SQL standard:
CREATE TABLE EMPLOYEE (EMPNO INTEGER NOT NULL, LASTNAME CHAR (20), SALARY FLOAT) PA0 SELECT LASTNAME PA0 FROM EMPLOYEE PA0 WHERE EMPNO IS NULL
In the above example, the column "EMPNO" is specified as "NOT NULL", i.e., the "EMPNO" column cannot contain any null values, whereas other columns may or may not contain null values. Consider a simple SELECT query performed against the employee table:
Since the EMPNO column cannot contain null values by definition, it can be verified at query compile time that the above query will return an empty result table. Without this simple and yet important optimization, the RDBMS software would have to fetch all tuples from the employee table and then apply the "IS NULL" predicate to the tuples. However, the "IS NULL" predicate is always false for the above query, since the "EMPNO" column was defined as "NOT NULL."
The above example illustrates the importance of exploiting column nullability information in query optimization. The exploitation of column nullability can mean a potentially huge saving in query execution time. However, determining if a column is nullable becomes less straightforward in presence of complex views or derived tables.
Some researchers have performed important work in this area, as reflected in the following publication: G. N. Paulley and Per-Ake Larson, "Exploiting Uniqueness in Query Optimization," Proceedings of IEEE '94 International Conference on Data Engineering, Houston, Tex., 1994, pp. 68-79. For example, this publication discusses the complexity of exploiting uniqueness in query optimization in the presence of null values. However, the current state-of-the-art in SQL query optimization has few solutions for optimizing query expressions by propagating and exploiting column nullability. Thus, there is a need in the art for improved methods of optimizing SQL queries by propagating and exploiting column nullability.