1. Field of the Invention
The present invention generally relates to memory systems and more specifically to unifying the addressing of multiple distinct memory spaces into a single address space.
2. Description of the Related Art
Performance requirements are constantly increasing in data processing systems. Multiple processing units may be configured to operate in parallel by the execution of multiple parallel threads. For some applications the multiple parallel threads execute independently. For other applications, the multiple parallel threads share some data. For example, a first thread may compute an input that is used by one or more other threads. Finally, the threads may be organized in groups, where data is shared within each group, but not between groups.
Multithreaded parallel programs written using a programming model such as the CUDA™ C (general purpose parallel computing architecture) and PTX™ (a low-level parallel thread execution virtual machine and virtual instruction set architecture) provided by NVIDIA® access two or more distinct memory address spaces each having a different parallel scope, e.g., per-thread private local memory, per-group shared memory, and per-application global memory. The programmer specifies the memory address space in each variable declaration and typically uses a load and store instruction specific to that memory address space when accessing the variable. For example, three different sets of load/store memory access instructions may be used to access three distinct memory spaces that have different parallel sharing scope. A first set of load/store memory access instructions may be used to access local memory that is private to each thread. A second set of load/store memory access instructions may be used to access shared memory that is shared between all threads in a single group. A third set of load/store memory access instructions may be used to access global memory that is shared by all threads in all groups.
The correct memory access instruction must be used to reference a pointer in a language like C/C++. Therefore, when a program fails to specify a memory address space for a pointer reference, the memory address space is determined by a compiler and the specific load or store instruction is inserted into the compiled program, prior to execution of the program. Additionally, the correct address that lies within the memory address space must be specified for the memory access instruction. When a C/C++ function is compiled separately from the calling code, the compiler does not know which memory space a pointer passed as a function argument references, and therefore does not know which memory access instruction to insert. Similarly, when a function is called from multiple calling points with pointer arguments to different memory spaces, or via a pointer to the function, the compiler cannot determine which memory access instruction to insert. A sequence of several instructions must be inserted into a program to access the correct memory space for arbitrary pointers that point to any of the memory spaces having a different scope. Inserting sequence of several instructions works for some static compile-time cases, but is not sufficient for dynamic cases, such as multiple calls to a library function via a function pointer, or separately-compiled functions.
Accordingly, what is needed in the art is a technique that enables a program to use a common load or store instruction to access memory spaces that each have a different scope.