1. Field of the Invention
The present application relates generally to data processing and, in particular, to compilation of source code to generate executable code. Still more particularly, the present application relates to a compiler method for reducing redundant read-modify-write code sequences in non-vectorizable code.
2. Description of the Related Art
On a machine that supports operations in registers on data that are shorter than the smallest element that can be stored to memory from a register, a sequence of several instructions are used to combine the contents of the memory location to be modified with new, data. These instructions read the old value of the memory location into an internal register of the machine, combine this old data with the mutating sub-datum, and write (or store) the combined result back into the original storage location in memory. These instructions are referred to as a read-modify-write code sequence.
A single instruction stream multiple data stream (SIMD) machine is a computer that performs one operation on multiple sets of data. Performing an operation on multiple sets of data is referred to as “SIMD execution.” SIMD execution is typically used to add or multiply two or more sets of numbers at the same time for multimedia encoding and rendering as well as scientific applications. A SIMD machine loads hardware registers with numbers and performs the mathematical operation on all data in a register, or even a set of registers, simultaneously.
When processing array data in loops, it may be possible to vectorize the computations. A compiler may vectorize code to improve performance. Vectorization is a process that packs the operations in several successive iterations into one set of operations, wherein the whole of the register is used and, thus, no read-modify-write sequence is required. All of the bits of the memory location receive new values from the register when a store operation occurs. Similarly, within a single basic block of a program, it may be possible to combine several independent congruent operations on contiguous data into a single SIMD operation. This is called extracting “Superword Level Parallelism” (SLP). This avoids the need for read-modify-write sequences. However, there are many cases in which such operations cannot be vectorized. For example, a dependency relationship may be violated by reordering implicit in vectorization, or several sub-parts of the register may not be computed by congruent operations.