The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.
When used to process a query, a conventional user-defined index can only evaluate a query predicate that includes a user-defined operator supported by the index. After evaluating the user-defined operator, the user-defined index returns a set of row identifiers to a database server. The database server then combines the data rows identified by the row identifiers received from the user-defined index with data rows from other row sources in order to apply any additional filtering and/or ordering that may be specified by the various predicates and clauses of the query being processed. This two-stage data processing for queries that include a user-defined operator is inefficient and adversely affects the performance of the database server at least because the database server needs to use more processing resources such as memory and CPU cycles.
In addition, when a user-defined operator is used in a query in conjunction with additional predicates, the query performance is poor when the additional predicates are selective. This is because the user-defined index, which is used to evaluate the user-defined operator, does not have any information about the additional predicates in the query. As a result, the user-defined index often ends up returning (or identifying) a lot of extra rows to the database server that processes the query, and the database server needs to apply the additional query predicates on all of these extra rows, thus wasting potentially a lot of CPU and I/O cycles.
Moreover, query performance is also poor when user-defined indexes are used to process queries that request ordered result sets. Since a user-defined index is not aware of any ORDER BY clause on the columns of the table in the query, the user-defined index would return an unsorted set of row identifiers when evaluating a user-defined operator specified in the query. A database server processing the query would then need to perform additional processing in order to sort any result set of data rows before the set of data rows is returned in response to the query. This additional processing adversely affects the performance of the database server because it requires that the database server use more processing resources such as memory and CPU cycles. This performance problem is further compounded when a query requests only the top N rows from a result set of rows. In this case, the user-defined index would return to the database server an unsorted set of row identifiers that identify all rows that satisfy the user-defined operator specified in the query, and the database server would need to sort the entire set of identified rows before determining the top N rows which the query is requesting. This, however, results in wasting processing resources both during the evaluation of the user-defined operator by the user-defined index and during the sorting performed by the database server.
As an example, consider a table “DOC” that may be created by the following data definition language statement:
create table DOC (docIDnumber,       author varchar(30),       pub_datedate,       documentCLOB)Suppose that a user-defined index “DOCIDX” is declared on table “DOC” as follows:
create index DOCIDX on DOC (document) indextype is (context)
where the “DOCIDX” index may be used to evaluate a “CONTAINS” operator.
Suppose that a database server receives for evaluation the following query “Q1”:
Q1. selectdocID, author  from DOC where CONTAINS (document, ‘Oracle’) > 0 and pub_date between ‘01/01/2007’ and ‘12/31/2007’order by  pub_dateWhen evaluating query “Q1”, the database server would invoke the functions of the “DOCIDX” index in order to evaluate the “CONTAINS” operator against the “document” column of the “DOC” table. However, since the “DOCIDX” index does not store any information about any of the other columns in the “DOC” table, the “DOCIDX” index cannot be used to evaluate the filtering predicate “pub_date between ‘Jan. 1, 2007’ and ‘Dec. 31, 2007’” and the ORDER BY clause “order by pub_date” of the query. Instead, after evaluating the “CONTAINS” operator, the “DOCIDX” index returns to the database server an unsorted set of row identifiers that identify the rows from the “DOC” table which satisfy the “CONTAINS” operator. Thereafter, the database server would apply the filtering predicate and the ORDER BY clause of the query to the set of rows identified by the row identifiers returned by the “DOCIDX” index. For example, the database server may use the set of row identifiers returned by the “DOCIDX” index to determine which of the identified rows of the “DOC” table satisfy the filtering predicate “pub_date between ‘Jan. 1, 2007’ and ‘Dec. 31, 2007’”; then the database server would sort the resulting set of rows by “pub_date” in accordance with the ORDER BY clause of the query before returning the sorted set of rows in response to query “Q1”. Thus, the use of the user-defined index “DOCIDX” to evaluate query “Q1” results in a two-stage data processing, which not only causes the database server to use more processing resources, such as memory, CPU cycles, and I/O cycles, to process what potentially can be a lot of extra rows, but may also prevent the database server from otherwise optimizing the query for faster processing.