1. Field of the Invention
The present invention relates to database management systems, and more specifically, to a method and apparatus for accessing data from a database.
2. Background Information
Nearly all high-level programming languages support various constructs (e.g., constants, variables, composite types, collection types, etc.) for manipulating data. A constant is a named item that retains a constant value throughout the execution of a program, as opposed to a variable, which can have its value changed during execution. A composite type is a defined set of one or more simple variables or other composite types, that is regarded in a program as a single entity. An example of a composite type is a record. A collection type is a structure used to store a list of data values having the same data type (e.g., arrays).
A data type is a definition of a set of data that specifies the possible range of values of the set, the operations that can be performed on the values, and, to some extent, suggests the way in which the values are stored in memory. Nearly all high-level languages include the following data types: integer, floating-point number, character, boolean, and pointer. Typically, when the constructs described above are used in a computer program, they are declared to be of a certain data type. For example, a program can declare a character variable or an integer variable. An array may be an array of pointers or an array of boolean values. In addition, compilers for high-level programming languages typically allow at least certain of the supported data constructs to be passed to routines as arguments. The term routine is used herein to refer to a named sequence of source code instructions within a program, which when compiled, can be executed. Thus, the term routine is used herein to refer to procedures, functions, subprograms, etc.
Nearly all compilers for high-level programming languages also include the concept of type checking. Type checking is the examination of the operations in a program during compiling or execution to make sure that the correct data types are being used. For example, if a routine requires two integers as arguments and the program provides an integer and a floating point value as arguments to the routine, then a compiler/interpreter capable of detecting this mismatch would generate a compile-time error. If the error is not detected at compile-time, then it should be detected upon execution of the routine and a run-time error (also referred to as an exception) should be generated. If the error is not detected at compile-time or during execution, the behavior of the program will be unpredictable.
Many programming languages also support conversions between various data types (referred to herein as "data type conversions"). For example, an integer representation of the value 10 may be converted into a floating point representation (1.0E1) of the value 10, a text representation of the number "10", etc. Many programming languages support "explicit" data type conversion. To cause an explicit data type conversion, a program must specifically indicate the data type conversion is to occur (e.g., In PL/SQL, execution of "TO.sub.-- CHAR (10)" yields a character representation of the number 10). Otherwise, an error will be generated as a result of the data type mismatch as described above. In addition to explicit data type conversions, some advanced programming languages (e.g., Oracle SQL and PL/SQL) also support "implicit" data type conversions (the data type conversions are performed without source code that explicitly designates the data type conversions are to occur). Rather than generating an error in response to a data type mismatch, a compiler supporting implicit data type conversions determines if the required conversion is supported as an implicit data type conversion. If the implicit data type conversion is supported, the required instructions to perform the data type conversion are inserted into the compiled code. Otherwise, a compile-time error is generated.
Many high-level programming languages also include the ability to declare variables as either strong (referred to herein as "strong typing") or weak (referred to herein as "weak typing"). If a variable is declared using strong typing, the programming language will not allow the program to change the data type of the variable during program execution. In contrast, if a variable is declared using weak typing, the program is able to change the data type of the variable during program execution.
For certain applications (e.g., database management systems), the above described constructs have proved to be insufficient. A database management system (DBMS) is a layer of software between the physical database (i.e., the files storing the data in the database) and the user. The DBMS manages all requests for database action (for example, queries or updates) from the user. In addition, a DBMS permits centralized control of security and data integrity requirements. As an example, a DBMS may be used for tracking airline reservations or managing payroll information.
Many database systems take advantage of distributed processing by using a client/server architecture. In this architecture, the database is divided into two parts: a frontend or a client portion, and a backend or a server portion. The client portion is the frontend database application that interacts with the users through keyboards, displays, and pointing devices such as mice. The client portion concentrates on requesting, processing, and presenting data managed by the server portion. The server portion runs server software and handles the functions required for concurrent, shared data access. Of course, the client-server architecture can be distributed across a network. For example, each of the client applications may be executed at a different node of the network, while one or more other nodes on the network are used for storing the database and executing the server software to process database requests sent by users at the client nodes.
To request data from a database, a client application may be written in any number of programming languages (e.g., the PL/SQL programming language developed by Oracle Corporation). In response to executing the client application, the client will cause the server portion to perform the required operations on the information in the database. PL/SQL is a structured programming language that supports flow of control statements (e.g., routines, if-then-else statements, exceptions, loops, etc.), as well as statements that provide the data manipulating functionality of SQL (the standard database access language for relational database management systems). In addition to other aspects of SQL, PL/SQL supports the "type system" (e.g., data types and type conversions, etc.) of SQL, as well as the comparison, set, and row operators of SQL.
A PL/SQL engine for executing routines written in PL/SQL code (e.g., packages, procedures, functions, etc.) can be present on either or both of the client and server sides. PL/SQL routines stored on the server side are referred to herein as stored routines. These stored routines can be called by the client applications. As a result of this architecture, PL/SQL routines can be centrally located, thereby reducing network traffic between applications and the database. In addition, users of PL/SQL can access data only as intended by the database developer.
As a result of the insufficiencies of general-purpose high level programming data constructs, special-purpose constructs referred to herein as "cursors" are supported by various database languages (e.g., SQL, PL/SQL, etc.). A cursor is a handle (a name or pointer) to the set of rows produced in response to executing a specific query statement. In the prior art, there are generally two types of cursors, "static cursors" and "dynamic cursors."
The PL/SQL language includes a number of statements for manipulating static cursors, including a declaration statement, an OPEN statement, a FETCH statement, and a CLOSE statement. A static cursor body is defined using the following syntax: EQU CURSOR cursor.sub.-- name [(parameter[, parameter, . . . ])] IS &lt;sql query&gt;;
where [ ] indicates optional and &lt;sql query&gt;is a legal SQL query. The columns and the tables which the static cursor will access must be specified in the &lt;sql query&gt;at the time the static cursor body is defined. An example of a format for a legal SQL query is: EQU SELECT {column.sub.-- name[, column.sub.-- name, . . . ] .vertline. *} FROM table.sub.-- name [WHERE predicate .vertline. parameterized predicate];
where .vertline. indicates "or" and * indicates all columns of the selected tables. The text starting with "SELECT" is referred to as the select statement. The part of the query following "WHERE" is referred to as the where clause or predicate. The predicate is a set of search criteria for accessing data from the database (e.g., empno&gt;2, where empno is the name of a column). Thus, the predicate determines the rows that will be accessed from the database. The predicate may optionally be parameterized. A parameterized predicate allows the set of search criteria to be varied based on one or more variables (e.g., empno&gt;x, where x is a variable). By changing the parameter value during run-time, the rows returned from the database using the static cursor may be varied as allowed by the predicate chosen when the static cursor is defined. However, like the columns and the tables, the predicate is chosen when the static cursor body is defined.
The set of rows returned in response to execution of a query is called the active set. The size of an active set is the number of rows that meets the search criteria of the query. To provide an example of the operation of static cursors, two exemplary database tables are respectively shown below in Tables 1A and 1B. As shown in Table 1A, the database table employee.sub.-- info includes columns for the employee's name (ename), number (empno), department (dept), and salary. As shown in Table 1B, the database table employee.sub.-- position includes columns for the employee's name (ename) and position.
TABLE 1A ______________________________________ ename empno dept salary ______________________________________ Jeff 1 10 20,000 Jim 2 20 10,000 Joe 3 20 12,000 Jack 4 30 50,000 ______________________________________
TABLE 1B ______________________________________ ename position ______________________________________ Jeff Manager Jim Programmer Joe Programmer Jack President ______________________________________
Thus, the statement: EQU CURSOR cursor1 IS SELECT ename, empno FROM employee.sub.-- info;
would generate a static cursor that could be used to access the active set shown below in Table 2. As shown in Table 2, the active set includes the ename and empno columns from the table employee.sub.-- info.
TABLE 2 ______________________________________ ename empno ______________________________________ cursor1 -&gt; Jeff 1 Jim 2 Joe 3 Jack 4 ______________________________________
The OPEN, FETCH, and CLOSE statements are used to control static cursors. In response to the OPEN statement, the server portion of the database executes the query associated with the static cursor, identifies the active set, and positions the cursor before the first row. The OPEN statement has the following format: EQU OPEN cursor.sub.-- name;
Thus, executing the statement "OPEN cursor1;" the server would identify the active set shown in Table 2 and position the static cursor before the first row in the active set.
The FETCH statement causes the server portion to advance the static cursor one row, retrieve the data in the current row, and store the retrieved data in the specified variables. The FETCH statement has the following format: EQU FETCH cursor.sub.-- name INTO {record.sub.-- name .vertline. variable.sub.-- name[, variable.sub.-- name, . . . ]};
where record.sub.-- name represents a previously defined record variable and the one or more variable.sub.-- names represent one or more simple variables.
The CLOSE statement disables the static cursor. The CLOSE statement has the following format: EQU CLOSE cursor.sub.-- name;
FIG. 1 is a block diagram illustrating certain aspects of static cursors. FIG. 1 shows an active set 100 comprised of data selected from one or more tables 110. As previously described, the active set 100 would include one or more columns and zero or more rows. A limitation of static cursors is that the number and names of the columns, the order of the columns, and the number and names of the tables, are fixed or static. Thus, a different static cursor must be defined for each query in which any of these attributes vary.
For example, cursor1 above returns the columns ename and empno from table employee.sub.-- info. However, assuming a user is also interested in different columns from the same table, another static cursor would have to be declared. As an example, the statement: EQU CURSOR cursor2 IS SELECT ename, dept FROM employee.sub.-- info;
would generate a static cursor that could be used to access the active set shown in Table 3.
TABLE 3 ______________________________________ ename dept ______________________________________ cursor1 -&gt; Jeff 10 Jim 20 Joe 20 Jack 30 ______________________________________
As another example, if the user is interested in all of the columns from the employee.sub.-- info table, another static cursor would have to be defined in the source code. Such a static cursor would be generated by the following: EQU CURSOR cursor3 IS SELECT * FROM employee.sub.-- info;
Thus, due to the static nature of the static cursor construct, three different static cursors are required to access different active sets from the same table.
In addition, static cursors are limited in that the selected table(s) are static. For example, to generate an active set from the employee.sub.-- position table shown above in Table 1B, an additional static cursor would have to be defined in the source code (e.g., CURSOR cursor4 IS SELECT * FROM employee.sub.-- position;).
Furthermore, the rows that are returned to the client application using a particular static cursor is limited by the predicate chosen at the definition of the particular static cursor. For example, if "empno=x" is used as the parameterized predicate, the user could not select rows that meet "empno&gt;x" or "empno&lt;x". In addition, at run-time the user cannot vary the rows returned using a static cursor based on criteria (e.g., values in other columns) that was not chosen when the static cursor was defined. Thus, a different static cursor must be declared in the source code of the client application for each such variation of active set needed by the user. These limitations make the static cursor construct cumbersome to program with.
The inability to associate static cursors with different queries at run-time is addressed by another prior art cursor construct referred to herein as "dynamic cursors." According to the standard adopted by the American National Institute of Standards (ANSI), dynamic cursors have the limitation of requiring a "host programming language" or an "embedding language." Within source code written in the host programming language (referred to herein as the "host environment"), blocks of SQL statements may be embedded (referred to herein as "embedded SQL blocks"). Preprocessors can be implemented for a high-level programming language, such as C, COBOL, or ADA, to allow the high-level programming language to act as a host programming language. An "embedded SQL host program" is a compilation unit (i.e., source code to be compiled) that consists of statements in the host programming language and embedded SQL blocks. Any data that needs to be communicated between the host environment and an embedded SQL block must be placed in "embedded host variables." An embedded host variable is a construct generated by the execution of instructions written in the host programming language. An embedded host variable may be used in an embedded SQL block as well.
A limitation of dynamic cursors is that each SQL statement identifier that is used to associate a dynamic cursor with a particular query must be unique in the executable image of an application. Having a unique name for each SQL statement identifier is difficult when a team of programmers is developing different parts of an application over a period of time. In addition, since the SQL programming language supports a different "type system" than prior art host programming languages, data must be converted between the data types supported by host programming languages and SQL. Furthermore, prior art database systems support the use of dynamic cursors on the client side, but do not support the use of dynamic cursors in stored routines on the server side.
A further limitation of static and dynamic cursors is that they cannot be passed between routines as arguments. This greatly reduces the programming flexibility of the static and dynamic cursor constructs. For example, FIG. 2 is a block diagram illustrating the use of static and/or dynamic cursors in one database architecture. FIG. 2 shows a line 200 dividing the database architecture into a client side and a server side. On the client side, there resides a number of client applications 210. Each client application is typically executing on one node (e.g., a work station having one or more processors) or multiple nodes of a computer system. On the server side, there resides one or more sets of server code 220 executing on the same or different nodes of the computer system. One or more of the client applications 210 may be executed on the same nodes as one or more of the sets of server code 220. The client applications 210 and the sets of server code 220 communicate via an Application Program Interface. The Application Program Interface is a layer of software between the client applications and the server code.
Unfortunately, since static and dynamic cursors cannot be passed as arguments between the client and server sides, the required statements to define, open, fetch, and close static and dynamic cursors must reside either all on the client side or all on the server side. For example, a server may be programmed to contain a first stored routine to open a static cursor and a second stored routine to fetch a row from that static cursor and return the row to the client. When a client application needs to retrieve rows from the database, the client application must call the first stored routine to open the static cursor and iteratively call the second stored routine to return one fetched row at a time. Assuming the client application needs 5,000 rows, 5,001 round trips will be performed--one round trip to open the static cursor and one round trip per row fetched. Due to the large number of stored routine calls required using this approach, the statements required to define, open, fetch, and close the required static and/or dynamic cursors are typically placed in the client applications. As illustrated by the following example, this greatly reduces the ability to alter installed client applications.
Typically, a business needs a database to store information in a manner in which its employees (users) can easily access it. For example, a business would want to store payroll information in a manner in which the accounting department employees can easily manage the payroll. Such a business would contact a database supplier to provide the basic server code and database architecture. In addition, that business would contact a client application provider (this may be the same company that provides the server code). The business communicates to the client application provider the functionality its employees will require. The client application provider then writes and compiles the client application(s) (as well as any required stored routines to be added to the server code) for installation at the various client and server sites.
Since each client application must contain the required static or dynamic cursor statements, adding new queries or modifying existing queries requires altering each client application at each client site. Since there can be thousands of client sites, this is a very time-consuming and expensive process. As an example, if the business decides that its employees require a different query from the database, then: 1) the client application source code would have to be altered, recompiled, and reinstalled at all the client sites; and/or 2) a patch would have to be distributed and applied at all of the client sites to alter the client applications. Due to the enormity of the task of updating every client site, it is common practice in the industry for the client application provider to provide only one recompile per database version change (e.g., Oracle 7 to Oracle 8).
One technique used to avoid the limitation of not being able to pass static and dynamic cursors as arguments is to use collections to pass the results of static and dynamic cursors. To illustrate the technique of using collections, assume a variable of the collection type is declared. Each element in the declared collection variable is of a record type consisting of the appropriate fields to hold the result of a row returned by the desired query. The client application calls a stored routine on the server side. Execution of the stored routine executes a query using a static cursor, retrieves as many rows from the result set identified by the query as will fit in the collection variable, and returns the collection variable to the client application as an argument. Of course, the cursor used on the server side in this example cannot be a dynamic cursor because dynamic cursors cannot be used in stored routines.
In addition to the limitations imposed by using a static cursor in the stored routine, the use of the collection variable suffers from additional limitations. For example, the use of collections to return the results of a query requires a significant amount of memory because the memory for the collection variable must be allocated on both the client and server sides. One approach to limiting the memory required to store the collection variable is to limit the number of elements in the collection variable (e.g., create a collection variable with 100 elements). Unfortunately, if the result set being returned has more rows than the number of elements in the array, the entire result set will not be returned. Alternatively, a collection variable can be declared to be of unlimited size. However, since a result set can include a large number of rows, the memory required to store the collection variable can be quite large. If the memory required to store the collection is not available, the technique of using collections is not viable. As another example, flexibility with respect to the number of rows returned to the client is reduced when using collections. As yet another example, the use of a collection to return query results is not evident from the argument list of a procedure.
In contrast to static and dynamic cursor constructs, one database architecture includes an additional construct referred to herein as the "result set" construct. Results sets allow for the return of data according to one or more queries in the form of a single data stream. To create such a stream, one or more SELECT statements are written without INTO clauses in a stored routine. As a result of calling a stored routine that contains such a select statement, the data stream will be created. The contents of the stream may then be accessed by the client.
A limitation of the result set construct is that the data identified by each of the select statements in the stored routine must be accessed sequentially. For example, assuming a first select statement identifies 5,000 rows from a first table and a second select statement identifies 4,000 rows from a second table. The data stream will sequentially provide the 5,000 rows from the first table followed by the 4,000 rows from the second table. As a result, if the client application needs the first 5 rows from both tables, it must sequentially process the 5,000 rows of the first table before reaching the first 5 rows from the second table. In addition, there is no way to back up in the data stream. Once a piece of data has been pulled off the data stream, another data stream must be opened to retrieve it again. Thus, if after processing the 5,005 rows in the above example, the client application determines it needs rows 50-100 from the first table, the client application must either: 1) have stored rows 50-100 when the client application pulled them form the data stream; or 2) open another data stream and re-process rows 1-100 from the first table.
Another limitation of the result set construct is that the resulting data stream does not identify when one row ends and the next begins. Rather, for each select statement the data stream includes a "row type identifier" to mark the beginning of data accessed according to that select statement. To describe what a row type identifier is, it must first be understood that a data type must be specified for each column of a table indicating the type of data that can be stored in that column. For example, the first column of a table may contain character data, the second column may contain integer data, the third column may contain date information, etc. The columns selected from one or more database tables for a particular active set contain the same data types as defined for the selected database columns. Since each column of a database table stores a particular data type, the data types and the order of the data types for the data elements in each row of a database or active set is defined.
In the previous example, the first three data elements in a row of data accessed from the database would respectively be of the character, integer, and date data type. The ordered list of data types defined for rows of a table or an active set is referred to as the table or active set's "row type." A row type identifier is a description of the row type defined by a select statement. Since the data stream does not identify when one row ends and the next begins, the client application must be sophisticated enough to detect the row type identifiers and decode the data stream (detect when each row begins and ends) using the format identified by the row type identifiers. In addition, since data accessed according to multiple select statements is placed serially in the data stream, the client application must be able to detect the row type identifiers to determine when the results from one SELECT statement end and the next begin.
Another limitation of result set constructs is that result set constructs cannot be declared as arguments of routines, and therefore, cannot be passed as arguments between routines. In fact, the data stream generated by a result set is immediately passed back to the client. Another limitation of the result set construct is that it does not fall into any standard high-level programming language construct categories or SQL standard construct categories. Therefore, its use is not intuitive to programmers. The result set construct is described in greater detail in Transact-SQL User's Guide for SYBASE SQL Server.TM., Release 10.0, Sybase, Inc. (1993).