1. Field of the Invention
The present invention is generally directed to computing operations performed in computer systems.
2. Background Art
In some computing environments, it is desirable to have multiple processing blocks or application-specific integrated circuits (ASICs) that can access a single shared resource, such as a shared memory. For example, some computer systems use multiple graphics processor units (GPUs) to improve graphics processing performance. In such computer systems, the GPUs may write to and/or read from a shared memory.
For example, FIG. 1 depicts a block diagram 100 illustrating a system that includes two GPUs—a GPU A 108 and a GPU B 110. Block diagram 100 also includes various software elements, such as an application 102 (e.g., a video game application), application programming interface (API) 104, and a driver 106, that execute on a host computer system and interact with GPU A 108 and/or GPU B 110 to perform graphics processing operations for output to a display 130. During the performance of these operations, GPU A 108 and GPU B 110 may read from and/or write to a local memory A 118 and a local memory B 128, respectively. In addition, GPU A 108 and GPU B 110 may also read from and/or write to a shared memory 105. Because GPU A 108 and GPU B 110 may each access shared memory 105, there must be a mechanism to insure that only one GPU accesses a particular location of shared memory 105 at a time. If such a mechanism is not included, the data in shared memory 105 could become corrupted.
A conventional mechanism that is used to restrict access to a shared resource in a multi-processing environment is a semaphore. A semaphore may be implemented as a single memory location that stores a count, which can be read/modified/written in an atomic operation. A semaphore may be used, for example, in a producer/consumer environment, to insure that the producer and the consumer do not access the same portion of the shared memory at the same time. A producer is a process that writes data to a shared memory and then updates the count, thereby indicating that data stored in the shared memory is ready for consumption. The consumer is a process that reads the data from the shared memory that is ready for consumption and then decrements the count stored in the semaphore.
The conventional semaphore mechanism could be implemented in a multiple GPU environment, but such an implementation would require a single point of control. For example, a single memory controller could be coupled to each GPU or one of the GPUs could be designed as a “master” GPU. Although such approaches would provide controlled access to a shared memory, such approaches require additional chip area because additional wires would be needed to couple the GPUs to the single memory controller or the “master” GPU. Furthermore, such approaches may result in timing lags because, if a first GPU in the multi-GPU environment stalls, the other GPUs coupled to the first GPU may also stall.
Given the foregoing, what is needed is a method and system that provide a mechanism for granting controlled access to a shared resource, without requiring a single point of control.
The features and advantages of the present invention will become more apparent from the detailed description set forth below when taken in conjunction with the drawings, in which like reference characters identify corresponding elements throughout. In the drawings, like reference numbers generally indicate identical, functionally similar, and/or structurally similar elements. The drawing in which an element first appears is indicated by the leftmost digit(s) in the corresponding reference number.