Modern processors often include instructions to provide operations that are computationally intensive, but offer a high level of data parallelism that can be exploited through an efficient implementation using various data storage devices, such as, for example, single instruction, multiple data (SIMD) vector registers. In SIMD execution, a single instruction operates on multiple data elements concurrently or simultaneously. This is typically implemented by extending the width of various resources such as registers and arithmetic logic units (ALUs), allowing them to hold or operate on multiple data elements, respectively.
The central processing unit (CPU) may provide such parallel hardware to support the SIMD processing of vectors. A vector is a data structure that holds a number of consecutive data elements. A vector register of size L may contain N vector elements of size M, where N=L/M. For instance, a 64-byte vector register may be partitioned into (a) 64 vector elements, with each element holding a data item that occupies 1 byte, (b) 32 vector elements to hold data items that occupy 2 bytes (or one “word”) each, (c) 16 vector elements to hold data items that occupy 4 bytes (or one “doubleword”) each, or (d) 8 vector elements to hold data items that occupy 8 bytes (or one “quadword”) each.
A number of applications have large amounts of data-level parallelism and may benefit from SIMD support. However, some applications may have data elements that are smaller than 8-bits and/or don't align to 8-bit (byte) boundary locations in memory. To maintain SIMD efficiency, these sub-byte elements may need to be decompressed to each occupy one byte before being processed in parallel. As a result, such applications may see somewhat limited performance benefits from SIMD operations.
For example, database calculations are common operations used in, for and/or by many types of applications. Some fields within a record may identify one of less than 256 choices, e.g.: (7-bits) the age of a motor vehicle's driver, (6-bits) one of fifty states in the U.S., (5-bits) one of the days in a month, (4-bits) one of the months in a year, (3-bits) one of the days of the week, (2-bits) one of the four base nucleotides in a genome sequence, and (1-bit) masculine or feminine gender. Especially for very large databases these fields are frequently compressed to be represented with fewer than 8-bits in order to take up less storage space. Therefore, in counting, sorting, computing or comparing populations in a database with the same characteristics, each sub-byte field required for processing may first need to be decompressed. This is precisely the kind of condition that can make it very difficult to process multiple data concurrently or simultaneously (i.e., using SIMD operations).
To date, potential solutions to such sub-byte representation needs, such as decompression, and related processing difficulties have not been adequately explored.