With some columnar in-memory data stores, column values can be dictionary compressed. Such compression is such that each distinct value in a column is mapped to a unique integer value. This mapping is one-to-one. These integer values are sometimes referred to a value IDs or yids as shorthand for value identifiers. Associated to each column there is a vector of these yids which can referred to as a column data array or an index vector. For storage efficiency the yids in the vector can be packed so that only n-bits to represent the highest vid as each position in the vector is logically n-bits wide. For example if n is equal to 2, in the first 64 bits of the index vector, the yids for the first 32 rows in the column can be stored.
A hash table maps values of one domain (e.g., strings, etc) to values in another, possibly different domain (e.g., integers, etc.). Consider a column of type string and a hash table mapping string values to yids. Assuming the first value inserted into this column is “hello”. This value can be identified within the column with vid 1. Let's assume the next value inserted is “hello world”. This new value will have a vid of 2. To keep track of these mappings a hash table is used to specify where the keys are of type string and the values of type integer. This hash table can be used to determine when a string is being inserted into the column if there is already a vid assigned to it or not.
Hash tables are often used for certain operations such as recovery and for specialized columns that do not require sorting as is provided by column dictionaries. For example, hash tables can be used for each delta column to keep track of the top-N most common values in the column, where N is typically a small value (e.g., the top 10). Regardless, when hash tables are used, both readers and writers need to concurrently access the hash table.