This disclosure relates to the field of processors.
To improve the efficiency of multimedia applications, as well as other applications with similar characteristics, Single Instruction, Multiple Data (SIMD) architectures have been implemented in microprocessor systems to enable one instruction to operate on several operands in parallel. In particular, SIMD architectures take advantage of packing many data elements within one register or contiguous memory location. With parallel hardware execution, multiple operations are performed on separate data elements by one instruction, typically resulting in significant performance advantages.
SIMD performance improvements may be difficult to attain in applications involving irregular memory access patterns. For example, applications storing data tables that require frequent and random updates to data elements, which may or may not be stored at contiguous memory locations, typically require rearrangement of the data in order to fully utilize SIMD hardware. This rearrangement of data can result in substantial overhead, thus limiting the efficiencies attained from SIMD hardware.
As SIMD vector widths increase (i.e., the number of data elements upon which the single operation is performed), application developers (and compilers) are finding it increasingly difficult to fully utilize SIMD hardware due to the overhead associated with rearranging data elements stored in non-contiguous memory storage.
The details of one or more embodiments of the invention are set forth in the accompanying drawings and the description below. Other features, objects, and advantages will be apparent from the description and drawings, and from the claims.