Field of the Invention
The present invention relates generally to the field of graphics processing and, more specifically, to the simultaneous execution of compute and graphics applications.
Description of the Related Art
A graphics processing unit (GPU) is a specialized processor that is configured to efficiently process complex graphics and other numerical computations. Each GPU has several on-chip hardware components, such as memory caches and logic operations units, configured to efficiently perform the graphics and numerical computations. In typical computing systems, graphics processing and other computationally-intensive operations are off-loaded by the central processing unit (CPU) to the GPU.
Operations performed by the GPU often include graphics operations as well as atomic transactions associated with specific memory locations. An atomic transaction associated with a memory location is a type of read-modify-write (RMW) operation. In an RMW operation, the value stored in the associated memory location is read and modified based on a computation operation and then the modified value is written back at the associated memory location. When the atomic transaction is in progress, the memory state of the associated memory location is preserved until that atomic transaction is complete.
To ensure that both graphics operations and atomic transactions are processed efficiently, typical GPUs include dedicated hardware units for executing atomic transactions and graphics operations separately. However, one drawback of such hardware design is that area on the GPU chip is consumed by each dedicated hardware unit which only executes atomic transactions or graphics operations. Therefore, to achieve acceptable throughput of graphics operations and atomic transactions, a large portion of the area available on the GPU chip is required to include several such dedicated hardware units.
As the foregoing illustrates, what is needed in the art is a mechanism for processing atomic transactions and graphics operations with high throughput without consuming a significant portion of area available on the GPU chip.