The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.
A data item is said to be “NULL”, or to have a NULL value, if the actual value of the data item is unknown or not meaningful; otherwise, the data item is said to be “NOT NULL”, or to have a NOT NULL value that may be of any datatype that can be associated with the data item. (Thus, a NULL value is not the same as an arithmetic zero or a Boolean FALSE value.) As referred to herein, a data item may be any identifiable portion of data including, but not limited to, a value stored in a particular column of a particular row of a relational table in a relational database, an object stored in a table of an object database, and an instance of Extensible Markup Language (XML) data stored in an XML document.
As used herein, “count( ) query” refers to a query that includes a count( ) operator. Generally, an operator is a set of instructions which, when executed, perform one or more specific operations. An operator may take zero or more input arguments and may be implemented in various ways, such as, for example, as a function, as a subroutine, or as a construct. A count( ) operator, also referred to hereafter as “count operator”, is an operator which, when evaluated against a set of data items, returns the cardinality of the set. (The cardinality of a set of data items is typically the number of items in the set.)
Queries that include count operators are widely used and may be, for example, Structured Query Language (SQL) queries, XML Query Language (XQuery) queries, or the SQL/XML queries. A SQL query is a query that conforms to a particular SQL specification. An XQuery query is a query that conforms to an XQuery specification. A draft specification for XQuery is described in “XQuery 1.0: An XML Query Language”, W3C Candidate Recommendation 3 Nov. 2005, the entire content of which is hereby incorporated by reference for all purposes as if fully set forth herein. A SQL/XML query is a query that includes one or more operators defined according to a SQL/XML standard, such as, for example, the SQL/XML standard defined in INCITS/ISO/IEC 9075-14:2003, the entire content of which is hereby incorporated by reference for all purposes as if fully set forth herein.
Both SQL and the SQL/XML standard provide a count operator that is implemented as a built-in aggregate function which, when evaluated against a set of values, returns the number of NOT NULL values in the set. For example, a SQL count aggregate operator, which is specified in a SQL query that is evaluated against a relational table, would return a number indicating the number of rows from the table that satisfy the condition that the computed input expression to the operator is NOT NULL. The SQL count operator may be evaluated against all rows of the relational table or against a subset of all the rows, where the subset may be determined based on other clauses (e.g. WHERE clause) of the query and any predicates specified therein. In another example, an XQuery count operator, which is specified in an XQuery query that is evaluated against an XQuery data model instance (e.g., a sequence), would return a number of items in the sequence that satisfy the criteria specified in the input expression of the count( ) XQuery query.
Consider, for example, the following SQL/XML query Q0:
Q0. select count(XMLForest(ename) NULL ON NULL)    from empThe XMLForest( ) operator is a SQL/XML function that takes as an input argument the scalar value stored in a particular column of a table and returns an XML instance containing an XML element, where the content of the XML element is a NOT NULL value from the scalar value stored in the particular column and the name of the XML element is the name of the particular column. In query Q0, the “NULL ON NULL” option specified in the XMLForest( ) operator indicates that when the input scalar value argument is a NULL value, the output XML instance is also a NULL value. Thus, when the count( ) aggregate operator is executed against table “emp”, query Q0 would count all the “ename” XML elements that are constructed from the values stored in the “ename” column of each row of table “emp”. If in a particular row the value stored in column “ename” is a NULL value, then according to the semantics of the XMLForest( ) operator with the “NULL ON NULL” option, no “ename” XML element is constructed for that row and consequently the count operator would not count such a row.
Developers that implement count( ) queries in software applications, as well as the users of such applications, typically expect the count( ) queries to execute very fast since the count queries would do nothing more than accumulate a count and return a number. However, the past approaches for evaluating count( ) queries cannot guarantee that the count( ) queries would be evaluated in an optimal way, and consequently cannot guarantee an acceptably fast execution time for the queries. This disadvantage of the past approaches is particularly severe and noticeable for count( ) queries that are executed against large XML documents to count instances of XML data stored therein or against large object database tables to count objects stored therein.
For example, according to one such approach for evaluating a count( ) query against a set of XML documents, a resulting set of instances of XML data is first materialized. (The materialized instances of XML data may be of any XML datatype, such as, for example, the XMLType datatype defined in the SQL/XML standard, or may be constructed from the object relational storage of XMLType). Then, this approach provides for testing the resulting set of instances of XML data to determine whether the instances are NULL values or not. Finally, the instances of XML data that are NOT NULL values are excluded and the resulting set with the remaining NOT NULL instances of XML data is added for accumulating the count result. The disadvantage of this approach is that the resulting set passed into the count operator has to be materialized and constructed before the determination can be made of whether the XML data instances in the set store NULL values or NOT NULL values. However, materializing and constructing a set of instances of XML data is very expensive with respect to the computational resources used (such as, for example, memory, CPU time, and disk space), especially when the set of instances that needs to be materialized is large. Exacerbating the problem even further, in this approach computational resources must be expended for materializing and constructing all the instances of XML data in the resulting set regardless of whether the materialized or constructed XML data instances are NULL or not, even though whether the base input column values to materialize or construct the XML data instance are NULL or not can be determined during query compile time.
Although the disadvantages related to processing NULL data items are presented above with respect to count( ) queries, it is noted that these disadvantages are not unique to count( ) queries or count operators. Rather, these disadvantages are common to any queries that require performing operations on NULL data items, where the operations may be specified in any expressions, select list items, and/or functional constraints of the queries.
Based on the foregoing, there is a clear need for techniques for optimizing queries that that overcome the disadvantages described above.