The efficient processing of large amounts of data is becoming increasingly important as businesses, entities and individuals store and/or require access to growing amounts of data.
Traditional data processing techniques, including conventional database management systems (DBMS) and the current rapid growing unstructured data processing domain, encode data attributes for compacting data storage and efficient searching. Encoding can be applied on a single data attribute (column) or multiple data attributes combined. Encoding a wide range of diverse data into binary format allows for storage savings. Search operations translate search criteria from original predicate values into an encoded binary value, allowing for efficient data comparison and scan.
In today's rapidly growing content serving domains, encoding is typically applied on both raw data such as data contained in relational databases and index data of general content. For example, data contained in relational databases may be column stores, which are typically used for business intelligence and data warehousing workloads.
A particular encoding scheme is usually picked based on data type and values that the storage and search must handle. When the number of distinct data values (i.e., “cardinality”) of an attribute is small, a fixed number of binary bits are used to encode distinct values. The goal of such encoding is to reduce storage requirements and reduce bandwidth needed to transfer data between different computer nodes and among different storage hierarchies of a computer system.