With advanced database systems having in-memory architectures, memory access can be one of the largest bottlenecks affecting performance. Techniques such as dictionary-based compression can reduce the number of input/output channel (I/O) operations to main memory.
With columnar databases, dictionary encoding can split a column into a dictionary and an attribute vector. The dictionary stores all unique values with corresponding value identifiers. The attribute vector, on the other hand, stores all value identifiers for all entries in the column. Positions within the column are stored implicitly and offsetting is enabled with bit-encoded fixed-length data types. The entries in the dictionary can be sorted and, additionally, compressed to provide for quicker access.
With conventional insert-only database systems, dictionary-based encoding can use variable sized blocks that are allocated in an append-only structure in a page chain. With such an arrangement, all allocations attempt to append data to the last page in the page chain, which can result in cache collisions on some cache lines containing global state information (such as allocation pointers, etc.). These collisions therefore limit scalability of parallel database manipulation language (DML) commands and/or parallel dictionary encoding.