Modern processors often include instructions to provide operations that are computationally intensive, but offer a high level of data parallelism that can be exploited through an efficient implementation using various data storage devices, such as for example, single instruction multiple data (SIMD) vector registers.
Some processors, especially graphics processors, in the past have provided a linear interpolation (often called LERP) primitive. Linear interpolation is a commonly used operation in digital graphics and audio processing. It is usually defined between two points, a and b, and with a fraction w between 0 (zero) and 1 (one), as: a+w×(b−a). If the linear interpolation is implemented according to how it is defined, one subtraction operation is followed by one multiplication operation, which is followed by one addition operation. Thus this first linear interpolation operation has three serially dependent operations. Alternatively, the linear interpolation operation is often defined as: a×(1−w)+w×b. Thus this second linear interpolation operation also has three serially dependent operations and requires one more multiplication. Such implementations may limit performance advantages otherwise expected for example, from a wide, or large width vector architecture.
To date, potential solutions to such performance limiting issues and bottlenecks have not been adequately explored.