The present invention relates in general to the field of memory database systems, and in particular to a method for encoding data stored in a column-oriented manner, and a data processing system for encoding data stored in a column-oriented manner. Still more particularly, the present invention relates to a computer program product for performing a method for encoding data stored in a column-oriented manner.
In main memory database systems, the available memory for storing table data may be limited. Therefore, the data must be compressed to be able to store it completely. Furthermore, there must exist a way of accessing the original values for query processing. In modern in-memory databases for operational warehousing a column-oriented store, scanning the entire table data of all required columns plays a role. Due to that, the amount of frequently accessed values is too large for caching all of them in a decompressed manner.
Since scanning a table is a CPU intensive operation, massive parallel processing is used to alleviate this bottleneck. Therefore, the warehouse data can be distributed on several nodes which perform the scan separately on a subset of the table data first, and secondly merge their results. In a snow flake schema, which is typical for online analytical processing (OLAP) warehouses, most of the data is stored in a central fact table which is connected to multiple small dimension tables. The central fact table then stores the measure values in a multidimensional online analytical processing (OLAP) cube. The dimension values are just represented by small IDs and stored in separate tables which need to be joined with the fact table if required by a query.