With the development in modern computer architecture, fast communication among multi-core processors Makes parallel processing possible. Because large main memory configurations are available and affordable, server settings with hundreds of cores and terabytes of main memory become a reality.
High performance database systems, such as in-memory databases, are adaptive to make full usage of the main memory provided by modern hardware. In such systems, all relevant data may be kept in main memory, so that read operations can be executed without disk I/O. The systems may be designed to minimize the number of CPU cache misses and to avoid CPU stalls due to memory access. One approach for achieving this goal is using column-based storage in memory, which leads to high spatial locality of data and instructions, so the operations can be executed completely in the CPU cache without costly random memory accesses.
In a column-based storage, the entries of a column are stored in contiguous memory locations. Columnar data storage allows highly efficient compression, such that the relevant data can be stored in main memory with less cost using data compression. The data structure that contains the main part of the data is called the main storage. The changes are taken over from the delta storage asynchronously at some later point in time. The separation into main and delta storage allows high compression and high write performance at the same time. The column store may implement MVCC (Multi Version Concurrent Control), which is based on having multiple versions of the same data in the database. When reading data it ensures that the operation reads the right set of versions required to get a correct and consistent view of the database. A Consistent View Manager may determine which version of the database that each operation is allowed to see depending on the current transaction isolation level.
As data changes are accumulated in the delta storage, the main storage is merged with the delta storage asynchronously in the background. In conventional systems, given that merges are computationally expensive and time consuming, they negatively impact performance of ongoing transactions and statements running in the foreground. The effect of merges is exacerbated on long running transactions, which may be blocked for a prolonged period of time or terminated prematurely. As a result, such systems do not process internal merging operations with high concurrency and performance throughput from the perspective of external transactions. Therefore, conventional systems fail to provide an ideal mechanism to handle merging operations with optimal performance, concurrency and transparency.