The use of SIMD instructions in the instruction set of the Intel Pentium® III processor is described in an article titled “Applications Tuning for Streaming SIMD Extensions, by James Abel, Kumar Balasubramanian, Mike Bargeron, Tom Craver and Mike Philpot, published in the Intel Technology Journal Q2, 1999 and publicly available on the Internet. This article will be referred to as Abel et al. In response to a SIMD instruction the processor treats the content of operand and result registers as a series of a plurality of numbers (for example four eight bit numbers in a thirty two bit register). The processor performs an operation that is specified by the SIMD instruction a number of times in parallel, each time using a different pair of numbers from the respective input registers as operands. The processor writes a combined result, which contains the respective numbers that result from these parallel-executed operations, to the result register specified by the instruction.
The availability of this type of SIMD instruction in the instruction set of a processor reduces the total number of instructions that have to be executed to perform a task wherein the same function has to be applied to large amounts of data, for example an image processing task, such as computer graphics processing, image compression or decompression. The reduction of the total number of instructions increases the speed with which such a task can be performed and reduces the power consumption involved with execution of such a task.
Alignment may cause problems when a task is executed using SIMD instructions. Alignment problems are a result of the way the operand data can be loaded from memory into registers that are used to supply operand data to the SIMD instructions. Typically operand data can only be loaded starting from addresses that are some integer multiple of a basic address distance. In most cases this is no problem, since the data that has to be processed (e.g. data for successive pixels) is stored successively starting from an aligned address, so that all data can be loaded by using successive load instructions. Abel et al. mention the alignment problem in the context of cache line splits. For special cases Abel et al. describe the use of a “moveups” instruction to support loading from unaligned addresses. In addition Abel et al. describe “shuffling” instructions, which can be used to rearrange numbers from registers. The need to use this type of instruction increases the number of instructions that must be executed.
One example of the alignment problem occurs during interpolation of image data, which involves combination of information for adjacent pixels. Abel et al. describe an interpolation approach wherein the parallelism of SIMD instructions is used to interpolate different color components of the same pixels together. In this case, a memory is used wherein sets of color components for successive pixels are stored successively.
Alternatively pixel data of one color component for adjacent pixels may be stored in successively adjacent memory locations. Preferably, it should be possible to use SIMD instructions to produce interpolated data for a plurality pixel locations in parallel. In this case, conventionally a first operand of the SIMD instruction should contain pixel data for a first plurality of adjacent pixels and a second operand should contain pixel data for a second plurality of pixels, whose pixel locations are offset to the locations of the pixels of the first plurality by a fixed offset (typically one pixel position). However, in this case at least one of the operands has to be loaded from an unaligned location, which increases the number of instructions that is needed.