1. Field of the Invention
The invention pertains to data storage and retrieval methods and systems in the field of computers and has the particular objective of achieving high scalability (the expansion of processing capability) in data storage and retrieval. Scalability is the expansion of processing capability. The invention pertains to the improvement of database system performance in providing to a given database system with a certain level of performance processing capability of from several times to several thousand times greater performance.
2. Description of Related Art
As described in Jeffrey D. Ullman, Deetabeesu Shisutemu no Genri [Principles of Database Systems] (trans. Kunii et al, Nihon Konpyuutaa Kyokai, 25 May 1985, pp. 45-71), Samuel Leffler et al, UNIX 4.3 BSD no Sekkei to Jissou [The Design and Implementation of UNIX 4.3 BSD] (trans. Akira Nakamura et al, Maruzen K. K., 30 Jun. 1991, pp. 193-191) and Michael J. Folk et al, “Fairu Kouzou” [File Structures], bit (trans. Hiroyuki Kusumoto, Kyouritsu Shuppan K. K., 5 Jun. 1997, pp. 169-191), conventional database storage and retrieval with computers has basically employed hierarchical indices.
The inventor has invented a Data Storage and Retrieval System (Japanese Patent 3345628, U.S. Pat. No. 6,415,375 and U.S. Pat. No. 6,584,555) that achieves high performance and ease of maintenance with the introduction of the concepts of location tables and alternate-key tables instead of conventional hierarchical indices, simplification of the complex processing entailed in processing indices and the employment of binary search techniques to search the tables themselves.
In the invention of a Database Reorganization System and Database (PCT/JP03/11592) (below referred to as the Database Reorganization System), the inventor has further proposed a framework for performing reorganization of the databases of the Data Storage and Retrieval System while such a database is in operation. The inventor shows that the addition of alternate-key location tables to alternate-key tables enables efficient reorganization.
A brief description follows of the Data Storage and Retrieval System proposed by the inventor. The data storage and retrieval systems of the invention use location tables and alternate-key tables and perform binary searches on these to retrieve target records.
Data records are stored in primary blocks in the order of their primary keys. When a primary block is full and a data record is added to that primary block, an overflow block is linked to that primary block and the data record stored therein. A further overflow block may be linked to an overflow block.
Location table records (or location table entries) that contain the addresses of the primary blocks are held in a location table that occupies a contiguous region.
A location table is secured beforehand in a contiguous region. This contiguous region is one of logical order and may span separated physical regions. If so, an address conversion table may be used to treat them as logically contiguous. This applies likewise below.
A final pointer is used to indicate the end of the region used by a location table. Records are stored in storage regions of fixed length termed “blocks”. Blocks are comprised of primary blocks and overflow blocks. When a record cannot be added to the final primary block, a primary block is added subsequent to it and the record stored therein.
Links do not refer to physical linkage; this terminology is used (here and below) because the state in which a primary block maintains the address of a first overflow block and the first overflow block maintains the address of a second overflow block allows the blocks to be treated as though physically connected.
Being stored in this fashion, location table entries are in the order of their primary keys. Retrieval by primary key consists of finding a block by performing a binary search between the first address in the location table and the location table entry pointed to by the final pointer and finding the target record within that block. Any overflow blocks linked to that block are also subjected to the search.
While this description addresses retrieval, record updating, addition and deletion may also be implemented with like logic.
Alternate-key records (or alternate-key entries), each made up of an alternate-key value and a primary-key value, are stored in alternate-key blocks in the order of their alternate-key values.
When an alternate-key block is full and an alternate-key entry is added to that alternate-key block, an alternate-key overflow block is linked to the alternate-key block and the alternate-key entry stored therein. A further alternate-key overflow block may be linked to an alternate-key overflow block. Alternate-key location table records (or alternate-key location table entries) that contain the addresses of the alternate-key blocks are held in alternate-key location tables that occupy contiguous regions.
Alternate-key location tables are secured beforehand in contiguous regions.
An alternate-key final pointer is used to indicate the end of the region used by an alternate-key location table.
In the addition of an alternate-key entry, an alternate-key entry having an alternate-key value greater than the alternate-key values of existing alternate-key entries is stored in the last alternate-key block, and if it cannot be stored in that alternate-key block, a new alternate-key block is created and the record stored in that alternate-key block.
A set of alternate-key location tables and alternate-key blocks is termed an alternate-key table.
Alternate keys are non-unique keys in a database, examples of which in an employee database might include name and date of birth. Some databases need not use alternate keys, and others may use multiple alternate keys.
A method retrieving a record having a given alternate key is to perform a binary search between the first entry in the alternate-key location table and the alternate-key location table entry pointed to by the alternate-key final pointer, find the target alternate-key block, search within that alternate-key block and find the alternate-key entry having the target alternate key. Any alternate-key overflow blocks linked to that alternate-key block are also subjected to the search.
Next, a binary search is performed on the location table with the primary key of that alternate-key entry to find the target block and find the target record within that block. Any overflow blocks linked to that block are also subjected to the search.
Since alternate keys are non-unique keys, multiple records that have the same alternate-key value may exist. If so and the next alternate-key record in the alternate-key block has the same alternate-key value, the above operations are repeated.
While this description addresses retrieval, record updating, addition and deletion may also be implemented with like logic.
Where multiple alternate keys exist, alternate-key tables are created and used in the same quantity as that of the alternate keys.
In the Data Storage and Retrieval System thus characterized, it is possible to maximize hardware performance and achieve considerable high performance, but it cannot achieve any greater processing performance; in other words, it lacks scalability. This is also essentially true of other, conventional methods. However, since conventional methods are premised on the use of hard disks, although it is possible to achieve acceleration to speeds greater than those achieved with hard disks by employing high-speed memory devices such as semiconductors, the performance limit of those high-speed memory devices then constitutes the upper limit on their performance and so they do not essentially achieve scalability.
One technique in the prior art for the improvement of scalability is the load balancer. This consists of deploying multiple servers that appear to be a single server. At high levels of external processing requests, processing requests first go to the load balancer and scalability is achieved with the allocation by the load balancer of processing requests among the multiple servers thus alleviating the processing load per server. However, this approach suffers from a fatal shortcoming. This is that since multiple servers are capable of processing only the logic of the processing and since a database is a singularity and each server accesses the same database, database performance has been restricted by processing performance. In other words, ever if servers are added for increases in processing requests, database performance reaches an upper limit.
One method of achieving scalability in the Data Storage and Retrieval System is conceived in the achievement of a certain degree of scalability through, in the Data Backup and Recovery System (PCT/JP01/03126) of the inventor, the provision a secondary system that is a backup and copy of the primary system actually used for updating and referencing data and use of the secondary system for referencing data, in addition to its original purpose as a backup and recovery system.
Since the volume of update transactions is generally around one-tenth that of the volume of referencing transactions, this method makes it possible to alleviate the load on the primary system by using the secondary system for referencing. However, since the secondary system may not be used for updating, the scalability that may thus be achieved is limited.
Further, since a secondary system must be of the same configuration as the primary system and the location table, data-storage files (aggregations of blocks) and (zero, one or multiple) alternate-key tables deployed on it as a set, it has been a large financial burden to deploy multiple secondary systems for the purpose of achieving scalability in addition to the purpose of backup.
Another conventional method not entailing the Data Storage and Retrieval System that is in general use to improve performance is mirroring of servers that hold data, but since a mirror database must be of the same size as the original database, this has required large storage volumes and also entailed great restrictions on updating since a mirror database may basically be used only for referencing the data. It is restricted by the inability to reflect on a mirror server the updating occasioned by the modification of data on the main server without a certain temporal delay. This entails the danger that when data is updated on a mirror server, the outcome of updating may be rendered invalid due to the time lag with updating on the main server, and this method is one that cannot be adopted for regular data processing. Thus, performance improvement using mirror servers may not be employed in regular real-time data processing.
Thus, circumstances are such that the achievement scalability for databases is a significant problem, but an adequate solution does not exist. These circumstances arise from, in addition to the need to duplicate the data itself, the complexity of index structures in conventional methods that makes duplication troublesome when indices are updated.
There has been significant demand in the field of information processing for databases capable of achieving scalability in line with increases in processing volume. The present invention meets such demand.