The disclosure relates generally to a method and apparatus for compiling computer-readable computer programs, and more particularly to a compiler for compiling program codes or statements.
Recently, the need to increase performance and power efficiency in modern processors has led to a wide adoption of Single Instruction Multiple Data (SIMD) vector units found in graphics processing units (e.g., GPUs) and other processors. Most vendors support vector instructions and the trend is pushing the vector instructions to become wider and more powerful. However, writing code that makes efficient use of these vector units is difficult and leads to platform-specific implementations. Compiler-based automatic vectorization is one solution for this problem.
The automatic vectorization is a process of taking a scalar code and converting it to vector format as much as possible and profitable, according to a predetermined cost model. Specifically, a Superword Level Parallelism (SLP) vectorization algorithm is a primary way to automatically generate vector codes starting from straight-line scalar codes. The SLP vectorization algorithm is typically implemented in compilers, including GNU Compiler Collection (GCC) and Low Level Virtual Machine (LLVM).
SLP utilizes a SIMD capability of processor architectures. Specifically, SLP relies on finding sequences of isomorphic instructions to pack together into vectors. However, isomorphic code sequences are not common in practice. SLP needs isomorphic operations along with consecutive memory accesses to be efficient. Thus, there has been attempts to improve isomorphism where a sequence of instructions are not exactly isomorphic even if it is semantically isomorphic, such as the use of “SHIFT” instructions in place of “MUL” or introducing redundant instruction. The isomorphism generally refers to a similarity between program statements having the equivalent operation and same order of operation, causing consecutive memory accesses. Conversely, non-isomorphism generally refers to a similarity between program statements having the equivalent operation and same order of operation, but causing non-consecutive memory accesses.
Conventional solutions to improve isomorphism include techniques of packing and unpacking of non-consecutive memory accesses, or keeping the accesses scalar. However, keeping the accesses scalar does not utilize the SIMD capabilities of the processors and the packing and unpacking techniques increase operational expenses, minimizing an overall benefit of vectorization. Further, these attempts to improve the isomorphism in the presence of control flow have been unsuccessful due to an architectural requirement supporting predicated execution. A design of an instruction set to support the predicated execution can be difficult.
Another source of non-isomorphism comes from non-consecutive memory accesses by computer programs, where actual memory layout may not be non-consecutive. This tends to make SLP less effective due to non-consecutive memory accesses during operation. An existing solution is to either apply packing and unpacking of the non-consecutive memory accesses or keep them scalar. Again, keeping the memory access scalar does not utilize the SIMD capabilities of the processors, and the packing and unpacking instructions require additional execution costs which minimize an overall benefit of vectorization.
Accordingly, there exists a need for improved method and apparatus for compiling computer-readable computer programs in order to address one or more of the above-noted drawbacks.