A main memory database is a database with all data resident in a main memory unlike a conventional database with all data stored in an external storage. A remarkable feature of the main memory database is that all data access control is performed in the main memory, resulting in data read/write speed several orders of magnitude higher than those of a disk resident database, thereby greatly improving the performance of database applications. Compared with the disk resident database, the main memory database has redesigned system architecture, and also has corresponding improvement in terms of data cache, query optimization, and parallel operation.
In another aspect, the existing microprocessor (CPU) has entered the multi-core era. The multi-core microprocessor usually adopts system architecture of a shared CPU cache, in which a hardware-level LRU (least recently used)-like replacement algorithm is employed for the CPU cache. When query processing involves a small strong locality (frequently accessed) dataset and a large weak locality (single-usage access or re-accessed within a long period) dataset, sequential access on the weak locality dataset produces cache conflicts with the strong locality dataset, so that the strong locality dataset is evicted from the CPU cache and can only be reloaded into the CPU cache in the subsequent operation. As a result, a large number of cache miss conflicts are generated due to cache bump, thereby increasing the latency of data access. This phenomenon is called cache pollution. In practice, the macroscopic cache pollution refers to cache conflicting between different query processing processes or threads in a shared CPU cache, for example, cache conflicting between a hash join query processing thread and an index join query processing thread in the shared CPU cache. The microscopic cache pollution refers to cache conflicting between datasets with different access characteristics in a query processing process, for example, cache conflicting between a sequentially scanned external table and a hash table in a hash join.
Page-coloring is a technology for high-speed address translation between a main memory and a CPU cache, namely, controlling call of a memory page into a specified region of the CPU cache by low address bit mapping of a physical memory address. In the existing page-coloring, data is loaded into a specified non-conflicting cache region by changing memory page-colors with different localities, so as to isolate cache pollution of weak locality data on strong locality data. Currently, a mature cache access optimization method is extending operating system kernel modules to support management of memory resources based on page-color queues, and providing concurrent query processing processes the function of allocating main memories by page-color, so as to control memory address space of different processes with not overlaps in the CPU cache, thereby reducing cache conflicts between processes having different data access characteristics. Such technical is applicable for the buffer management function of a disk resident database. Data of the disk resident database reside in a disk, so the data must be loaded into a memory buffer before query processing. Moreover, weak locality data is not reusable or is reused within a long period. Therefore, weak locality datasets with a large data volume can be exchanged to be placed in a buffer memory corresponding to a small number of memory page-color queues through the memory address allocation technology of the memory buffer, so as to allocate more memory page-color queues for strong locality datasets to ensure sufficient available memory resources. However, the page-color optimization technology of the disk resident database is process granularity oriented and cannot provide optimization with fine granularity for datasets with different data access characteristics in a process.
The page-coloring faces two technical challenges in the application of the main memory database. One challenge is that data of the main memory database resides in the main memory, and the main memory database accesses the main memory directly unlike the disk resident database that accesses data indirectly through the buffer. A large weak locality dataset often occupies large memory address space, while its weak locality requires mapping of large memory address space to smallest cache address space, that is to say, requires allocation of fewest page-colors for the huge memory address space. Each page-color represents a maximum of 1/n (n is the number of page-colors) available memory address space. The main memory database cannot allocate few page-colors for the large datasets with weak locality. The second challenge is that, if dynamic page-coloring is employed to change page-colors of weak locality data pages by memcpy function before memory data access, although the problem that few page-colors cannot be allocated for weak locality datasets can be solved, the latency of memcpy function seriously affects the overall performance of data access.
Therefore, the challenge of cache optimization technology for the main memory database is that no buffer mechanism exists in the main memory to support a reallocation mechanism of dynamically changing page-colors of weak locality datasets with large address space through cache access. If physical memory address space is assigned for strong locality datasets and weak locality datasets by page-color, the utilization rate of the memory address space is low. Allocation of many page-colors means acquisition of large address space. However, a strong locality dataset just requires a large page-color region and does not require actually large memory address space to store the small dataset. Meanwhile, a weak locality dataset just requires a small page-color region and actually requires large memory address space to store the large dataset. The quotas of the memory address space and the page-color region are difficult to be satisfied at the same time.