Field of the Invention
The present invention generally relates to parallel processing and, more specifically, to a parallel architecture for accessing uniform data with reduced bandwidth.
Description of the Related Art
In a single-instruction multiple-thread (SIMT) processing environment, threads are organized in groups of P parallel threads called warps that execute the same program. Although the P threads of a thread group execute each instruction of the program in parallel, each thread of a thread group independently executes the instruction using its own data and registers. Each thread of a thread group executes a program instruction sequence sequentially. A common parallel programming pattern involves an instruction sequence including two separate load instructions for loading two units of data and an arithmetic instruction for further processing the two units of data. Consequently, when processing the instruction sequence, the group of P threads issues 2P memory read requests when the load instructions are processed.
Many parallel algorithms, including tiled matrix multiply, image convolution filters, and image motion estimation, can be organized such that for one load instruction in the instruction sequence the corresponding memory address is the same for each thread in the group of P threads. When processing the load instruction, each thread in the group of P threads issues a separate memory read request to retrieve the corresponding unit of data from memory. However, because the memory address provided by each thread to the load instruction is the same across the thread group, each separate memory read request, when processed, causes the retrieval of the same unit of data from memory. In such a case, the memory retrieval bandwidth is wasted unnecessarily for retrieving the same unit of data P times, once for each thread in the group of P threads.
Accordingly, what is needed in the art is method for efficiently processing multiple read requests received from each thread in a group of parallel threads for retrieving the same data from memory.