Data formats are designed to enable efficient processing and storage of a variety of different dataset characteristics. Algorithms that process data in these formats are critical. Unfortunately, current processors are not always capable of working with particular data formats efficiently.
Processor designers have historically provided minimal direct support for application specific instructions. Thus, software developers have relied on the increasing speed at which existing processors execute a set of instructions to increase performance of a particular algorithm.
The performance of typical processing units, however, is not increasing at the same rate. Thus, software developers are not able to rely as much on increasing computer power to more quickly process particular data formats.
Single instruction multiple data (“SIMD”) processors perform the same operation on multiple data items simultaneously. SIMD processors exploit data level parallelism by executing a single instruction against data in multiple registers or subregisters. Thus, the throughput per instruction may be increased accordingly. SIMD processors are typically used for graphic and other multimedia applications. Accordingly, it may be difficult to use the SIMD architecture to process particular data formats efficiently.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.
Terms and Notation
For purpose of explanation, the following terms and conventions are used herein to describe embodiments of the invention:
The term “byte” herein describes number of contiguously stored bits. While the common usage implies eight bits, the size of a byte may vary from implementation to implementation. For example a byte may refer to any size including, but in no way limited to: eight bits, sixteen bits, thirty-two bits, sixty-four bits, and so on.
The notation <XY> herein describes a vector of bits, e.g., <10>. Spaces may be added between bits merely to increase the ability to read the contents of the vector, e.g., <1111 0000 1111 0000>.
The notation [J, K] herein describe a set of contiguous values, where J is a first value and K is a second value, which may be equal or different.
The notation “0x” may be used to denote a hexadecimal number. For example, 0x2C may be used to represent the number forty-four. In some embodiments where bit representations may be unwieldy, hexadecimal representations may be used to increase the ability to read and understand the description.
The term “register” is a register or subregister that may include one or more smaller subregisters. Unless otherwise specified a register may be a SIMD register or a register typically used in the scalar processor.