A common operation in computing systems is to “pack” data or “unpack” already-packed data. For example, 4-byte data elements may have a lot of unused high order bits. To conserve space, the data elements may be “packed” by removing a certain number (e.g., 12) high order bits from each data element and storing the result in volatile or non-volatile storage. Later, the packed data may be “unpacked” by adding extra bits to each data element to allow instructions that require 4-byte data elements to be executed.
One approach for dealing with packed data involves a dedicated hardware engine or “coprocessor.” However, co-processor solutions require significant overhead in controlling the co-processor's actions and gathering its results. Furthermore, co-processor solutions are limited (hard-coded) in how they can be used. Multiple copies of co-processor hardware are needed in modern, highly-threaded CPU designs.
Another approach for dealing with packed data involves instruction-level solutions, such as “parallel deposit” and “parallel extract”. Relative to co-processor solutions, current instruction-level solutions have less overhead, use relatively less logic, and offer more flexibility. However, current instruction-level solutions require separate hardware structures, a significant amount of control logic, and do not extend well when dealing with data widths larger than 64 bits.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.