1. Field of the Invention
The invention disclosed and claimed herein generally pertains to a Single Instruction Multiple Data (SIMD) code generation method that uses Superword-Level Parallelism (SLP) in connection with mixed isomorphic and non-isomorphic packed code. More particularly, the invention pertains to a method of the above type that uses a permute operator to combine isomorphic and non-isomorphic expressions into an aggregated expression, in order to provide a final output stream.
2. Description of the Related Art
It is well known that computer processing speed has increased through the use of parallel processing. One form of parallel processing relies on a SIMD architecture, which processes multiple data packed into a vector register in a single instruction, such as SSE for Pentium, VMX for PPC 970, CELL, and Dual FPU for BlueGene/L. The type of parallelism exploited by SIMD architecture is referred to as SIMD parallelism, and the process of automatically generating SIMD operations from sequential computation is referred to as extracting SIMD parallelism.
An application that may take advantage of SIMD is one where the same value is being added (or subtracted) to a large number of data points. In a SIMD processor, the data is understood to be in blocks, and a number of values can all be loaded simultaneously. Typically, SIMD systems include only those instructions that can be applied to all of the data in one operation. Thus, if a SIMD system loads eight data points at once, an add operation that is applied to the data will be applied to all eight values at the same time. SIMD instructions can include add, load, multiply and store instructions.
One approach for extracting SIMD parallelism from input code is the Superword Level Parallelism (SLP) algorithm. The SLP approach packs multiple isomorphic statements that operate on data, located in adjacent memory positions, into one or more SIMD operations. The terms “packing” and “pack”, as used herein, refer to combining statements that can be scheduled together for processing. Two statements are “isomorphic” with respect to each other if each statement performs the same set of operations in the same order as the other statement, and the corresponding memory operations access adjacent or consecutive memory locations.
The SLP procedure is illustrated by the following statements of Table 1 for the loop or looped iterations (i=0; i<64; i+=1):
TABLE 1Statements with Isomorphic Relationshipa[3i + 0] = b[3i + 0] * c[3i + 0]a[3i + 1] = b[3i + 1] * c[3i + 1]a[3i + 2] = b[3i + 2] * c[3i + 2]
The statements in Table 1 are isomorphic in relation to each other because each statement performs two load operations, one multiply operation, and one store operation in the same order. Moreover, the corresponding memory operations in these statements (or any statements with an isomorphic relation) must access operations that are either adjacent or identical. For example, the memory access of a[3i+0] is adjacent to the memory access of a[3i+1]. Likewise, a[3i+1] is adjacent to a[3i+2]. Similarly, the memory accesses of “b” and “c” are adjacent.
SLP proceeds first by analyzing the alignment of each memory reference in a basic block, and then packs adjacent memory references (e.g., a[3i+0] and a[3i+1], or b[3i+0] and b[3i+1]). After memory references are packed, SLP further packs operations that operate on adjacent data, such as b[3i+0]*c[3i+0] and b[3i+1]*c[3i+1]. The SLP procedure and other SIMD code generation techniques address mostly stride-one memory and computation streams. Generally, a stride-one memory access for a given array is defined as an access in which only consecutive memory locations are touched over the lifetime of the loop.
Current SIMD architectures frequently use Multiple Instruction Multiple Data (MIMD) instructions that perform different computations on different elements of a vector. For instance, the ADDSUBPS instruction in SSE3 (Streaming SIMD Extensions by Intel) performs an add operation on odd elements of input vectors, and a subtract operation on even elements of input vectors. As a result, certain SIMD packing techniques have been developed for use with statements that include non-isomorphic expressions.
In one such technique, wherein each statement in received input code individually has a stride-one stream, an effort is made to identify pairs of expressions X and Y within a basic block of the input that are semi-isomorphic with respect to one another. X and Y are defined to be semi-isomorphic with respect to each other if they satisfy at least one condition, such as X and Y are identical, X and Y are literals, X and Y are loaded from adjacent memory locations, or X and Y are stored to adjacent memory locations and the stored values are semi-isomorphic with respect to each other. As another example, an expression is semi-isomorphic in relation to another expression, when one expression performs the same operations and in the same order as the other expression, with the exception that at least one mathematical operation is different. For example, one expression will use a “+” operation in lieu of a “−” operation in the other expression.
Another technique, referred to as SLP with interleaved data, receives statements that have non-isomorphic expressions, and then looks at all the inputs of the statement that have a given stride. Sub-streams of received data are scattered, and non-isomorphic computations are made using these sub-streams. This technique, however, is not as efficient as it might be.
It would thus be desirable to provide a new approach for using SLP in the presence of mixed isomorphic and non-isomorphic packed code or statements, wherein efficiency is significantly improved over prior art techniques.