The present invention relates in general to data processing, and in particular to synchronization of computing thread arrays.
Parallel processing techniques enhance throughput of a processor or multiprocessor system when multiple independent computations need to be performed. A computation can be divided into tasks, with each task being performed as a separate “thread.” (As used herein, a thread refers generally to an instance of execution of a particular program using particular input data.) Parallel threads are executed simultaneously using different processing engines, allowing more processing work to be completed in a given amount of time.
A pushbuffer is a means for one processor to send data and commands to another processor, providing a first-in-first-out (FIFO) queue, but not necessarily implemented as a hardware FIFO. For example, in a typical personal computer system with a central processing unit (CPU) and a parallel or graphics processing unit (GPU), the CPU writes commands and data for parallel or graphics processing into the pushbuffer and the GPU processes the data and commands in the same order in which they were written. The data and commands in the push buffer typically reside in either the system memory or in the memory attached to the GPU. Software running on the CPU writes the data and commands into memory, and typically also updates registers in the GPU so as to indicate more data and commands have been written to the pushbuffer. The GPU reads from the pushbuffer, often via DMA block transfers, in order to utilize memory bandwidth efficiently. The pushbuffer may vary in size from a few bytes up to many megabytes, and could be even larger on future systems. Systems may use multiple pushbuffers, and the CPU may also write information to the GPU without using the pushbuffer.