1. Field
This disclosure relates generally to data processing systems, and more specifically, to SIMD dot product operations with overlapped operands within a data processing system.
2. Related Art
Increased performance in data processing systems can be achieved by allowing parallel execution of operations on multiple elements of a vector. For example, a single-instruction multiple-data (SIMD) scalar processor (also referred to as a “short-vector machine”) allows for limited vector processing while using any existing scalar general purpose register (GPR). For example, in a data processing system having 32 scalar 64-bit GPRs, each scalar register may be able to hold 2 32-bit vector elements, 4 16-bit vector elements, or 8 8-bit vector elements and thus able to perform 2 32-bit vector operations, 4 16-bit vector operations, or 8 8-bit vector operations.
The SIMD architecture is amenable to performance enhancement for a variety of different algorithms such as image processing or other algorithms which use linear filters extensively. However, inefficiencies arise when the dimensions of the underlying hardware vectors do not allow for an efficient mapping of the dimensions of the arrays being processed within these algorithms.