1. Field of the Invention
The present invention is related to predicate processing by iterator functions.
2. Description of the Related Art
A database system is one in which data is stored, retrieved, and processed. Data records in a relational database management system (RDBMS) in a computer are maintained in tables, which are a collection of rows of data all having the same columns. Each column maintains information on a particular type of data for the data records which comprise the rows of data. Tables in the database are searched using, for example, a Structured Query Language (SQL), which specifies search operations or predicates to perform on columns of tables in the database to qualify rows of data in the database tables that satisfy the search conditions. A predicate is an element of a search condition that expresses or implies a comparison operation (e.g., customer_number>100 or department_name=“finance”).
Relational DataBase Management System (RDBMS) software using a SQL interface is well known in the art. The SQL interface has evolved into a standard language for RDBMS software and has been adopted as such by both the American National Standards Institute (ANSI) and the International Standards Organization (ISO).
One type of function that may be included in a SQL statement is an iterator function. Iterator functions may also referred to as “table” functions in the DB2® UDB® system (e.g., for OS/390®, Linux®, UNIX®, and Windows® operating systems) available from International Business Machines Corporation. These iterator functions may be written in, for example, C, C++, JAVA®, or SQL. An iterator function receives a set of arguments and returns a table to the SQL statement that invokes the function. A table function may be invoked in the FROM clause of a SELECT statement.
An iterator function may be used to retrieve/extract data from an external source (e.g., a text search engine or a spread sheet) into a database. The result set (i.e., rows of data) returned by an iterator function may be treated as a table and may be referred to as a “virtual table”. When an iterator function is used in the FROM clause of a SELECT statement, columns in rows of data returned by the iterator function are mapped to a virtual table, and the columns of the virtual table may be used in predicate expressions and may be joined with columns of other tables.
The following sample SQL statement (1) includes a transpose (“IBM”) iterator function:
Statement (1)SELECT vcol1.high, vcol1.low, vcol1.time_stampFROM TABLE(FUNCTION transpose(“IBM”)) vtab1(vcol1),audit_table autWHERE vcol1.time_stamp>‘1970-01-01 00:00:00.00000’ AND  vcol1.time_stamp <‘1971-01-01 00:00:00.00000’ AND  vcol1.time_stamp = aut.time_stamp;
The transpose (“IBM”) iterator function operates on timeseries data that stores historical data (e.g., of stock symbols) as binary large objects (BLOBs) and returns a set of rows that are stored in result table “vtab1” with column “vcol1”. The column “vcol1” is of a row datatype. A row datatype may have multiple fields, and, in this example, one of the fields is a timestamp field represented by “t”. Therefore, vcol1.time_stamp is used to access a timestamp field in a row of data. The predicates on vcol1.time_stamp (i.e., vcol1.time_stamp>‘1970-01-01 00:00:00.00000’ AND vcol1.time_stamp<‘1971-01-01 00:00:00.00000’) are used to filter the rows of data with the specified time range and are also used in the join filter (vcol1.time_stamp=aut.time_stamp).
In Statement (1), the qualification (i.e., the predicates) on the table returned by the transpose (“IBM”) iterator function is:
vcol1.time_stamp>‘1970-01-01 00:00:00.00000’ ANDvcol1.time_stamp <‘1971-01-01 00:00:00.00000’ ANDvcol1.time_stamp = aut.time_stamp;
Typically, the iterator function creates a virtual table with a result set. Then, the qualification is applied to the virtual table to filter rows of data in the virtual table. In many cases, only a small percentage of rows of data in the virtual table remain after the qualification is applied. Unfortunately, because the qualification is applied after the rows of data are retrieved for the result set, many rows of data are unnecessarily retrieved for the virtual table.
The following sample SQL statement (2) includes a text search iterator function:
Statement (2)SELECT a.isbn, a.year, b.scoreFROM articles a,  text_search(‘articles’, ‘abstract’, “Bush” in same sentence as  “recession”) bWHERE a.year >= 2000 AND a.isbn = b.key AND b.score >= 0.9ORDER BY 3
The text_search( ) iterator function returns a key and a score for articles that have “Bush” in the same sentence as “recession” from an external data source “b”. The application program submitting SQL statement (2) is interested in articles with a score (assigned by the text_search( ) iterator function) greater than or equal to 0.9. Because the text_search( ) iterator function is not aware of the score qualification, the text_search( ) iterator function returns a virtual table with all articles that have “Bush” in the same sentence as “recession.” Not having knowledge about the qualification and filtering rows of data after the virtual table is created by the text_search iterator function is an overhead on query performance and system resources. Note that the database engine would process the a.year>=2000 AND a.isbn=b.key predicates as the iterator function does not have access to table “a”. Alternatively, a new text_search( ) function may be written with a new parameter for passing in the score to the text_search( ) function. However, if the overloading of the function is not allowed, this leads to a situation in which a user chooses the right name of the parameter to be passed in, requiring new development and release of software whenever there is a need to add a new predicate to the search criteria.
The DB2® Text Extender product (available from International Business Machines Corporation) has a hard-coded solution to this problem. For more information on the DB2® Text Extender, see “DB2® Optimization in Support of Full Text Search”, by Albert Maier and David Simmen, Bulletin of the Technical Committee on Data Engineering, Volume 24, No. 4, IEEE Computer Society, December 2001.
Thus, there is a need in the art for a generalized solution to allow predicate processing by iterator functions.