FIG. 1—1 shows a typical database system DBS comprising a primary memory D, e.g. a primary disc memory D, and a data processing device DPD. In the primary memory D data of a database DB is stored as a plurality of data blocks p00, p01, . . . p32. Typically the individual data blocks p are called pages of the database DB, i.e. the date in the primary memory device D is organized as a plurality of pages P. As is indicated in FIG. 1—1 , each datablock or data page P consists of one or more data objects OB. As is shown for the datablock P02, P32, each of these pages comprises a plurality of N and M of data objects OB02-1 . . . OB02-N, OB32-1 . . . OB32-M.
The data processing device DPD comprises as the second memory device MM a main memory MM, e.g. a work memory in a computer, and a processing means PM, e.g. a central processing unit of a computer. Furthermore, the data processing device DPD can comprise a socalled hash-table HT for determining the addresses in the secondary memory device MM. The usage of the hash-table HT is explained with further details below.
As shown in FIG. 1—1 the main memory MM comprises at least a page cache memory section PCS which is used as the traditional solution for caching objects in databases. Additionally the main memory can comprise a work memory or resident data memory section RDS in which data objects can also be stored. Where the page cache memory section PCS is arranged depends on the data processing device configuration. For example, the page memory PCS can also be provided within the processing means PM.
The usage of a page cache memory PCS is well-known in computer technology. Basically, instructions and data may be stored in a cache memory and if such instructions and data in the cache memory are accessed repeatedly within a short period of time, as often happens with program loops in the processing means PM, then program execution, i.e. the read/write access will be speeded up. The cache can normally only hold small parts of the executing instructions or data. When the cache memory is full, its contents are replaced by new instructions or data as they are fetched from the primary memory D. A variety of cache replacement algorithms are used. The objective of these algorithms is to maximize the probability that the instructions or data needed in the data processing device DPD are found in the cache memory. This probability is known as the cache hit ratio. A higher hit ratio means that a larger percentage of the instructions or data are being found in the cache and do not require access to a slower resident data memory section RDS. The basic idea of using a cache memory can be applied at different points in a computer system in cases where the main memory is not large enough to contain all the programs and their data.
As already indicated above in the discussion of the database DB, data blocks or segments of a program or of data are often called pages and are transferred from the disc memory D to the main memory MM for processing. When other pages are needed they may replace the pages already in the cache memory if the cache memory is full. The automatic movement of a large program or data segment between the main memory MM and the disc memory D, as the processing means PM executes, is managed by a combination of operating system software and control hardware. The whole process of loading and organizing data in the main memory MM is called memory management.
In FIG. 1—1 the data of the database DB can be managed as follows in connection with the page cache memory section PCS. When the processing means for example wants to perform a read access to a particular object OB of the database, it first calculates its page identity. The page identity is basically an identifier that tells the processing means PM the page identification to which the desired data object OB belongs.
For example, if the processing means wants to read access the object OB02-2 or the data object OB32-2 which are part of the data blocks (pages) p02, p32, the processing means PM first calculates the page identity PID=p02, p32 in step S1. This page identity identifies the page when it is still stored in the database DB and when it has already been transferred to the page cache memory section PCS.
As shown in FIG. 1—1, the database DB as well as the page cache memory section PCS are both organized in data blocks consisting of pages. That is, the database DB comprises the data organized in pages and in the page cache memory section also page-like data regions PCSP are provided. The reason is that anyway only pages for data blocks are transferred from the primary memory D to the secondary memory MM even when only a small data object OB is required.
A typical size of a data object OB is 4 kbytes or 8 kbytes. A page is thus a logical construction of one block on the disc. A typical access time for finding a page on disc is 8 ms whereas only a quarter of a microsecond is necessary for accessing data stored in the main memory MM. Therefore, the time needed for finding the page and transferring the page to the page cache memory section PCS is an important factor that reduces the access time to an individual data object.
When the page identity p02, p32 has been determined in step S1, the next step is to make a lookup in a data structure called the hash-table HT to identify where the desired pages are currently stored in the main memory, more precisely in the page cache section PCS. Two scenarios can happen, i.e. either the page is already stored in the page cache memory section or the relevant page has not been stored in the page cache memory section. If the page that contains the desired data object has already been stored, e.g. if the page P02 containing the desired object OB02-2 has already been stored in the page cache memory,then it is only necessary to read out in step S2 the address location AD-P02 of the relevant page P02 from the hash-table HT. In step S3 the processing means PM gives a read-access request to the main memory to read out the data object from the page P02 at the particular memory location AD-P02.
Similarly, if the page has not been stored in the page cache memory section PCS, then the processing means PM first calculates the relevant page p32, a load request for loading the relevant page P32 from the database DB is given in step S1; and then the steps S2, S3 are repeated with the relevant address AD-P32 of the page where the desired object OB32-2 to be read resides. The data structure used for determining the addresses for the pages is often a hash-ased data-structure as explained above.
As can already be seen from the above description, it is always required that the complete page is stored in the page cache memory section PCS in one of its data regions PCSP, even if only a small data object of a few hundred bytes needs to be read by the processing means PM. Furthermore, it should be noted that the page cache memory PCS is only a comparably small memory and that it will have to be updated with new data from the database after some time. In connection with the hash-table access it is also possible to specify the data regions PCSP which are overwritten first, e.g. before other data regions PCSP are overwritten when new data is loaded from the database DB. Thus, a kind of hierarchy regarding the loading and overwriting of data in the page cache memory section PCS is possible.
Furthermore, in a particular point in time, pages p10, p01, p02, p32, p00, p22 may have been stored in the page cache as shown in FIG. 1—1, however, frequent read accesses have only been performed to the pages p02, p00 (this is indicated with a hatching from the left bottom corner to the right top corner in FIG. 1—1) whilst pages p01, p22 have only been accessed moderately (indicated with a hatching from the top left corner to the right bottom corner in FIG. 1—1). The page p10 (having no hatching) has not been accessed very frequently. Hereinafter, a page data region PCSP that has been accessed frequently is also called a “hot” page. Likewise, a page data region PCSP that is not accessed frequently is also called a “cold” page. Pages having read accesses therebetween are called “warm” pages. As can be seen from FIG. 1—1, due to the fact that always complete pages need to be stored in the page cache, a lot of memory space is occupied in the page cache memory section PCS even though the pages themselves have different read access frequencies because individual data objects can only be accessed by first storing the complete page in the page cache.
However, most data base systems DBS do not have the possibility to move an object from one page to another page. The reason is that the page identity PID is a part of the references to the data object and this reference could either be parts of the external references or part of references from so called indexes. Therefore, collecting data objects from “hot” pages on the new page requires that all references of the (only temporarily available) page structure in the page cache memory PCS need to be updated. This requires time and is unpractical.
Most databases DBS have the original data stored as pages on a disc memory and the main memory MM contains a page cache memory PCS which is overwritten with new data from the database DB at specific times when access is required to a particular data object. However, the processing means PM, of course, does not only perform read accesses to the pages in the page cache but also processes data objects and thus updates data on the data objects. For example, if a data object relates to the address of a customer, if the address of the customer changes, then the processing means PM accesses this data object and changes the address specification and then stores the data object again on its page in the page cache PCS. Thus, data on page cache memory pages can be newer than on the disc memory D. Therefore, main memory pages need to be sent back to the disc memory D at times. Usually, a log ensures that updates not on disc are not lost.
Whilst in FIG. 1—1 only temporarily a number of pages, not all pages of the database DB, have been stored in the page cache section PCS, some particular kind of main memory data base as shown in FIG. 1-2 has all the data stored in the page cache memory PCS. Such type of databases are called main memory data bases and, of course, require an immense memory space in the page cache memory PCS. Whilst this reduces the access time since it is not necessary first to locate the page in the database DB and to retrieve this page into the page cache memory PCS, the memory requirements are very extensive in case of large databases DB.
As shown in FIG. 1—1, it is also possible that the secondary memory MM contains a first (page cache) memory PCS but also a second (resident data) memory section RDS. Some new types of database systems DBS have data which reside always in the resident data memory section RDS with data only occasionally stored in the page cache memory section. For example, as indicated in FIG. 1—1, data objects OB01-1, OB10-1, OB22-1, OB32-2 are resident in the resident data memory section RDS (which is also organized as page datablock RDSP) and other parts of the database DB may be stored as pages in the page cache memory PCS. It is even possible that a part of a page or record always resides in the resident data memory section RDS and other parts of the page or record only reside occasionally in the page cache memory section PCS. Of course, as indicated in FIG. 1—1, there must be references REF01, REF22, REF32 between these parts. This provides the possibility to easily move data that resides occasionally in the page cache, i.e. pages which are not used so frequently (“cold” pages) back to the database DB and only parts of the page, i.e. the object OB22-1 is kept in the resident data memory section RDS. Of course, in this case all references must be made to the resident data memory section and this means that it becomes easy to move the disc-data.