The present invention relates to database information retrieval, and in particular to a method and system for improving storage efficiency in retrieving data in database operations.
Software applications that retrieve data from a relational database into a computer system may specify a Structured Query Language (SQL) statement, such as a SELECT statement, specifying the data to be retrieved, and may associate that statement with a “cursor” object. The application then may perform an OPEN operation against the cursor, and a FETCH operation to retrieve each “row” of data. Prior to executing the FETCH, the application may allocate an area in the computer system's memory in which to retrieve the data. This allocation requires that the application know the size of the data before performing the FETCH. This kind of solution is primarily for “native” application programs that execute on the same computer where the database application is running.
Each row of data consists of one or more “columns” of data, each with a pre-defined “data type”. In one approach the application can use the total length of all of the columns to determine how large an area of memory is needed. For most data types, this length can be acquired from the database, or it can be inferred from the type. Some standard relational data types allow insertion of variable length. When the table and columns are defined to the relational database, a maximum length n is specified for the columns When data is inserted into the table, the actual data provided for these columns can have a length anywhere from 0 to n characters or bytes. For example, one standard data type in relational databases is known as VARYING CHARACTER. This data type typically allows for data up to 32767 bytes in length. However, often there is less, sometimes much less actual data stored in the column for most rows. For a column definition with a size in this range, applications typically just allocate the maximum size for the column, even though there may be less, or much less actual data. As a result, a certain amount of allocated storage will remain unused. In some cases, but not all, such a level of unused storage may be acceptable.
Many relational database implementations support a data type known as a “Large Object” or LOB. Two types of LOBs include a Character LOB (CLOB) type and a Binary LOB (BLOB) type. Similarly to the VARYING CHARACTER type described earlier, the LOB data types allow for variable sizes of data, with a defined maximum. The allowable range for these LOB datatypes is frequently up to 2 Gigabytes (greater than 2 billion bytes). As often is the case with the VARYING CHARACTER type, there can be less, sometimes much less actual data inserted into a LOB column than the maximum allowed.
The reason for over-allocation of storage space has been straightforward, but not necessarily problematic when individual allocations are not too substantial. For example, a database administrator that is defining the table characteristics for a new database table often does not know how much data will be inserted over the life of the table. Consequently, the administrator might choose a maximum size for a LOB column such as 10 MB. However, in practice, the actual amount of data stored in a given column might be less than 100 kb, or 1/100th of this maximum.
For an application that needs to FETCH the data, allocating 10 MB of storage might be prohibitive, particularly if it is necessary to access several of these large columns, or if there are multiple concurrent application processes running in that system, vying for a limited amount of computer memory.
This problem is further compounded by a data type that has become a feature in relational database implementations—XML, or eXtended Markup Language, data type. One standard for this data type defines it as a variant of a LOB, but with no defined maximum length. As a result, even for an application that is able to allocate the maximum length for the memory area, that maximum value is not known before the FETCH occurs.
Some applications have attempted to solve this problem of not knowing the maximum value by guessing at a practical maximum length for buffer size allocation. However, if the guess is too low, many relational database implementations will not provide the actual data length back to the application.
One solution to this problem is to use a “LOB locator” for discovering the actual storage size of the data and for allocating only the necessary amount of storage. A LOB locator provides a way to acquire a “handle” on the LOB value and then manipulate the value via that handle. For example, rather than fetching a LOB directly into an allocated memory area, the application program may fetch a LOB locator that references the LOB data value. The data remains in the database until requested by the application via the locator. The application can then use the locator to find out the size of the LOB value.
This approach has the following drawbacks. A first drawback is performance, because it is necessary to make three requests to the database in order to have only the required amount of memory allocated for the buffer: The initial FETCH request to retrieve the LOB locator; a request to find out how large the LOB value is; and another FETCH to retrieve the LOB data, after allocating the necessary size buffer.
The cost of transferring control to the database three times, rather than just once, can be costly. Additionally, if there is more than one LOB value in the result set of the initial fetch, each additional LOB value will require two more trips to the database (a request to find out the size, and a fetch to retrieve the data)
A second drawback to the “LOB locator” approach is resource consumption within the database to manage the LOB locators. Frequently, applications will scan through many thousands or millions or rows of data. If the database must keep track of each locator that was passed to the application, managing these locators can consume a lot of memory very quickly.
The very use of XML objects can introduce a further complication, when implementers of XML decide not to provide for locator capability on these XML objects in recognition of LOB locator limitations.
In view of the foregoing, it would be desirable to have a method and system to allow retrieval of large objects without using locators and without requiring maximum storage allocation.