A database management system (DBMS) comprises the combination of an appropriate computer, direct access storage devices (DASD) or disk drives, and database management software. A relational database management system is a DBMS which uses relational techniques for storing and retrieving information. The relational database management system or RDBMS, such as the DB2 product from IBM, comprises computerized information storage and retrieval systems in which data is stored on disk drives or DASD for semi-permanent storage. The data is stored in the form of tables which comprise rows and columns. Each row or tuple has one or more columns.
The RDBMS is designed to accept commands to store, retrieve, and delete data. One widely used and well-known set of commands is based on the Structured Query Language or SQL. The term query refers to a set of commands in SQL for retrieving data from the RDBMS. The definitions of SQL provide that a RDBMS should respond to a particular query with a particular set of data given a specified database content. SQL however does not specify the actual method to find the requested information in the tables on the disk drives. There are many ways in which a query can be processed and each consumes a different amount of processor and input/output access time. The method in which the query is processed, i.e. query execution plan, affects the overall time for retrieving the data. The time taken to retrieve data can be critical to the operation of the database. It is therefore important to select a method for finding the data requested in a RDBMS which minimizes the computer and disk access time, thereby optimizing the cost of doing the query.
To execute a SQL statement, such as “select * from t1”, in a RDBMS program like DB2 the statement is presented to the SQL optimizer. The SQL optimizer parses, tokenizes and semantically analyzes the statement, transforming it into the Query Graph Model (QGM) representation of the statement. The QGM representation is then processed to perform a number of heuristic optimizations. The output of this pass is then fed to the cost-based planning stage. The cost-based planning stage processes the optimized QGM, producing an access plan, based on LOw LEvel Plan OPerators (LOLEPOPs). The plan produced by the cost-based optimizer is then processed by a Code Generation module (CODGEN) to produce an OPeration Code (OPCODE) based access plan, which can be processed at runtime by a Relational Database System (RDS).
In prior versions of RDBMS programs, such as DB2 (Versions 5.2 and older), the OPCODE based access plan is interpreted at runtime by the Relational Database System (RDS). The Relational Database System examines each OPCODE, and looks up the function which is called to process the OPCODE and its operands. The processing for the OPCODE includes loading the OPCODE's operands and making decisions based on information associated with the OPCODE that was provided at CODGEN time. These decisions are made repeatedly, each time the OPCODE processing function is called, and direct the function of the OPCODE. An alternative implementation involved producing multiple OPCODEs for these similar functions. This approach still results in considerable duplication in underlying OPCODE processing.
It will be appreciated that one of the principle problems with existing RDBMS programs, such as the DB2 product, is the fact that the RDBMS includes an interpreter which executes during runtime. Since the interpreter translates and runs the OPCODE's at the same time, operation during runtime is considerably slower than for a compiler based implementation. In view of the costs associate with replacing existing interpreter-based RDBMS programs, there remains a need for a mechanism which can improve the slower runtime performance of the interpreter phase in such systems.