The present application relates generally to an improved data processing apparatus and method and more specifically to mechanisms for providing a data parallel function call that determines if a called routine is data parallel or scalar.
Multimedia extensions (MMEs) have become one of the most popular additions to general-purpose microprocessors. Existing multimedia extensions can be characterized as Single Instruction Multiple Datapath (SIMD) units that support packed fixed-length vectors. The traditional programming model for multimedia extensions has been explicit vector programming using either (in-line) assembly or intrinsic functions embedded in a high-level programming language. Explicit vector programming is time-consuming and error-prone. A promising alternative is to exploit vectorization technology to automatically generate SIMD codes from programs written in standard high-level languages.
Although vectorization has been studied extensively for traditional vector processors decades ago, vectorization for SIMD architectures has raised new issues due to several fundamental differences between the two architectures. To distinguish between the two types of vectorization, the latter is referred to as SIMD vectorization, or SIMDization. One such fundamental difference comes from the memory unit. The memory unit of a typical SIMD processor bears more resemblance to that of a wide scalar processor than to that of a traditional vector processor. In the VMX instruction set found on certain PowerPC microprocessors (produced by International Business Machines Corporation of Armonk, N.Y.), for example, a load instruction loads 16-byte contiguous memory from 16-byte aligned memory, ignoring the last 4 bits of the memory address in the instruction. The same applies to store instructions.
There has been a recent spike of interest in compiler techniques to automatically extract SIMD or data parallelism from programs. This upsurge has been driven by the increasing prevalence of SIMD architectures in multimedia processors and high-performance computing. These processors have multiple function units, e.g., floating point units, fixed point units, integer units, etc., which can execute more than one instruction in the same machine cycle to enhance the uni-processor performance. The function units in these processors are typically pipelined.
Extracting data parallelism from an application is a difficult task for a compiler. In most cases, except for the most trivial loops in the application code, the extraction of parallelism is a task the application developer must perform. This typically requires a restructuring of the application to allow the compiler to extract the parallelism or explicitly coding the parallelism using multiple threads, a SIMD intrinsic, or vector data types available in new programming models, such as OpenCL.
Before a compiler can determine if a portion of code can be parallelized and thereby perform data parallel compilation of the code, the compiler must prove that the portion of code is independent and no data dependencies between the portion of code and other code called by that code exist. Procedure calls are an inhibiting factor to data parallel compilation. That is, data parallel compilation is only possibly when the compiler can prove that the code will correctly execute when data parallel optimizations are performed. When the code calls a procedure, subroutine, or the like, from different portions of code, object modules, or the like, that are not visible to the compiler at the time of compilation, such data parallel compilation is not possible since the compiler cannot verify that the code will correctly execute when the data parallel optimizations are performed.
Moreover, in a single instruction, multiple datapath (SIMD) architecture using a SIMD data parallel model, an important restriction implemented is that the architecture can only follow a single program flow at a time. This restriction makes data parallel compilation impossible in an object oriented model where the object inheritance might provide different methods for different portions of code subject to data parallelism optimizations.