1. Field of the Invention
The invention relates generally to compiler systems and, more specifically, to a method for compiling a parallel thread execution program for general execution.
2. Description of the Related Art
Certain computer systems include a parallel processing subsystem that may be configured to concurrently execute multiple program threads that are instantiated from a common application program. Such systems are able to execute multiple instances of at least a portion of the application program in parallel to achieve execution speedup. CUDA is a parallel programming model known in the art for application programs that may be compiled to execute on parallel processing subsystems. An application program written for CUDA may include sequential C language programming statements, and calls to a specialized application programming interface (API) used for configuring and managing parallel execution of program threads. A function associated with a CUDA application program that is destined for concurrent execution on a parallel processing subsystem is referred to as a “kernel” function. An instance of a kernel function is referred to as a thread, and a set of concurrently executing threads may be organized as a thread block. A set of thread blocks may further be organized into a grid. Each thread is identified by an implicitly defined set of index variables. Each thread may access their instance of the index variables and act independently with respect to other threads based on the index variables.
An application program may include certain compiled functions for execution on a general purpose central processing unit (CPU) and other functions compiled for execution on a parallel processing subsystem. The functions compiled for execution on the CPU typically include native CPU instructions. The functions compiled for execution on the parallel processing subsystem typically include instructions for a virtual machine instruction set architecture (ISA) that may be mapped to a native ISA associated with the particular parallel processing subsystem. One virtual machine ISA known in the art is the parallel thread execution (PTX) ISA, which is designed to provide a stable programming model and instruction set for general purpose parallel processing. When an application program comprising compiled PTX kernel functions is loaded for execution within a computer system, the PTX kernel functions are mapped to the ISA of a parallel processing subsystem within the computer system. Certain parallel processing constructs are provided by the parallel processing subsystem, such as thread synchronization, thread identification, and certain specialized graphics operations such as texture map sampling.
In certain scenarios, a user may wish to execute an existing application program that is compiled for distribution to customers on a general purpose central processing unit (CPU) rather than on a parallel processing subsystem. Unfortunately, conventional CPUs are typically configured to execute only native instructions and do not include parallel processing constructs for execution of PTX operations. As a consequence, the existing application program may not be conventionally executed on a general purpose CPU.
As the foregoing illustrates, what is needed in the art is a technique for executing a compiled parallel application program on a general purpose CPU.