One of the most important functions of computers in today's society is their ability to store and retrieve a large amount of data in collections known as computer databases. In order to effectively manage these computer databases, the organization of the databases must be carefully designed.
Some databases, such as relational databases, store their information in indexed tables. Thus, a relational database contains two fundamental kinds of objects: tables, which contain the user data, and non-table objects, such as indexes, which keep track of administrative information to keep the database system running smoothly.
An object may be partitioned or non-partitioned. A non-partitioned object is stored in a single operating system file, called a datafile. A partitioned object is usually much larger and is stored in more than one datafile. When a datafile is added to a database, it is assigned an "absolute file number" which is unique throughout the entire database. The absolute file number of a datafile is an index of the corresponding entry in a control file, which contains operating system specific information about that datafile, such as the operating system file name.
A table object, which houses user data, is divided into one or more data blocks, containing one or more records (called "rows"). Rows contain one or more columns, which contain the specific information, such as a customer's name or automobile part number, the user stores. Since every data block belongs to some datafile, it is natural to identify each block by the absolute file number to which the data block belongs and by the file offset of the data block. The absolute file number provides an easy way to locate the operating specific information necessary to open the datafile, and it is efficient to access a data block within an opened datafile by the file offset.
As a result, the combination of the absolute file number and the file offset forms a readily useable absolute data block address ("absolute DBA") as a way to make a reference to any particular data block within the database system. An "absolute disk pointer" is a data structure that stores an absolute DBA as a pointer to a data block.
With reference to FIG. 2, database 200 contains ten objects: data dictionary 240; control file 242; five tables of user data, 250 to 258; and five index files, 260, to 268, built upon tables 250 to 258, respectively. Each object is stored a datafile, 210 to 220 and 224 to 232, and assigned a unique absolute file number (AFN) of 1 to 12, respectively. Data item 270 is found in table 250, stored at offset 300 in datafile 212 having an AFN of 2. Both index files 260 and 266 are built on table 250 and both contain an absolute disk pointer, 280 and 284 respectively, pointing to data item 270 and having an absolute DBA of 2:300. Given a disk pointer with an absolute DBA of 2:300, the corresponding data item is fetched by looking up the AFN in control file 242 to find the name of datafile 212 and other operating system specific information. With that information, datafile 212 is opened, and the block at offset 300 is retrieved. Similarly, absolute disk pointer 282 in index 268 has an absolute DBA of 11:200, pointing to data item 272 in table 258.
There are usually very many disk pointers in a database. Because reading a data item from a disk is fairly slow, the most used data items are stored in a main-memory cache. Given a disk pointer, containing an absolute file number and a file offset, the process for retrieving a data item according to the disk pointer is shown in the flow chart of FIG. 3. After a disk pointer is read containing an absolute file number AFN (step 300), execution proceeds to step 310, checking the cache for the data item according to the AFN and the file offset of the disk pointer. If the data item is not found in the cache, called a "cache miss," execution branches to step 320, where the datafile corresponding to the AFN is opened using the information stored in the control file for the AFN. The opening step is needed on only the first cache miss and does not have to be reduce on following cache misses. Then the data item at the file offset indicated by the file offset portion of the disk pointer is fetched from the opened data file (step 330). The data item is next inserted into the cache (step 340). On the other hand, if there is a cache hit the data item is simply retrieved from the cache (step 350) without reading the disk.
The maximum convenient size for a number on a digital computer is determined by the word size used by the computer. For example, if a computer has a 32-bit word size, it is convenient to handle numbers that are 32 bits in size, that is less than 2.sup.32 or 4,294,967,296. Integers larger than the word size become cumbersome to handle, because they require an additional word to store their value, and additional computing cycles to read, write, and compare the extra word. Consequently, some early database systems limited the total size of a disk pointer to the size of the word supported by the computer platform on which the database resides.
When the disk pointer is limited to 32 bits, the limiting factor for the number of datafiles in a database is not the number of file entries available in a control file, 32 bits, but the number of bits allocated for the absolute file number in a disk pointer, which can be significantly smaller. For example, in a computer system with 32-bit words, the absolute file number can be set at 10 bits and the file offset at 22 bits. The 10-bit absolute file number in a disk pointer would specify up to only 1024 unique datafiles, a number which limits very large computer databases. To circumvent this problem, a conventional approach is to allocate more bits for the absolute file number in the disk pointer.
Upgrading existing databases is a problem if the number bits allocated to an absolute disk pointer is increased. A database typically has a very large number of disk pointers. These existing disk pointers may no longer hold unused space for expansion, making it impossible to both increase the address space of absolute disk pointers and maintain upward compatibility. If upward compatibility is not maintained, then users will have to export and import existing databases, which are very time consuming procedures.
Another drawback to the use of absolute file numbers is that absolute file numbers make the transfer of a group of datafiles between two databases more difficult. Each absolute file number within a database is unique, but absolute file numbers are not unique between two different computer databases. As a result, a disk pointer in one computer database may contain the same absolute file number as a disk pointer in another database, but the datafiles referenced by the two disk pointers are completely different. Since it is common to have a very large number of disk pointers within a datafile, it is difficult to copy a datafile for use in another database. Within the transferred datafile, every disk pointer must be patched, replacing the absolute file numbers in the disk pointer with a newly assigned absolute file number in the destination database. This procedure may be impossible to do if a database cannot recognize or enumerate all the disk pointers in the datafiles. Even if the database can visit every disk pointer, it will be a very time consuming process.
Accordingly, there is a need for a way to increase the address space of disk pointers in an upward compatible manner. There is also a need for a way to transfer disk pointers between databases without patching.