FIG. 1 is a block diagram of an example graphics processing system 100 or device in which one or more disclosed embodiments may be implemented. The system 100 may be, for example, a computer, a gaming device, a handheld device, a set-top box, a television, a mobile phone, or a tablet computer. The system 100 may include a central processing unit (CPU) 105, a system memory 115, a graphics driver 110 (although as discussed below, multiple graphics drivers are contemplated), a graphics processing unit (GPU) 120, and a communication infrastructure 125. A person of skill in the art will appreciate that system may include software, hardware, and firmware components in addition to, or different from, that shown in FIG. 1. It is understood that the system may include additional components not shown in FIG. 1.
The CPU 105 and GPU 120 may be located on the same die (accelerated processing unit, APU). The CPU 105 may be any commercially available CPU, a digital signal processor (DSP), application specific integrated processor (ASIC), field programmable gate array (FPGA), or a customized processor. The CPU 105 and/or GPU 120 may comprise of one or more processors coupled using a communication infrastructure, such as communication infrastructure 125. The CPU 105 and/or GPU 120 may also include one or more processors that have more than one processing core on the same die such as a multi-core processor. The memory 115 may be located on the same die as the CPU 105 and/or GPU 120, or may be located separately from the CPU 105 and/or GPU 120. The memory 115 may include a volatile or non-volatile memory, for example, random access memory (RAM), dynamic RAM, or a cache.
The CPU 105 may execute an operating system (not shown) and one or more applications, and is the control processor for the system. The operating system executing on CPU 105 may control, facilitate access and coordinate the accomplishment of tasks with respect to system.
The graphics driver 110 may comprise software, firmware, hardware, or any combination thereof. In an embodiment, the graphics driver 110 may be implemented entirely in software. The graphics driver 110 may provide an interface and/or application programming interface (API) for the CPU 105 and applications executing on the CPU 105 to access the GPU 120. As described above and herein, there may be more than one graphics driver 110, although only one is shown.
The communication infrastructure 125 may provide coupling between the components of system and may include one or more communication buses such as Peripheral Component Interconnect (PCI), Advanced Graphics Port (AGP), and the like.
The GPU 120 provides graphics acceleration functionality and other compute functionality to system 100. The GPU 120 may include multiple command processors (CP) CP 1 . . . CP n 130, multiple graphics engines (Engines) Engine 1 . . . Engine n 135, for example, 3D engines, unified video decoder (UVD) engines, or digital rights management (DRM) direct memory access (DMA) engines. GPU 120 may include a plurality of processors including processing elements such as arithmetic and logic units (ALU). It is understood that the GPU 120 may include additional components not shown in FIG. 1.
The CP 1 . . . CP n 130 may control the processing within GPU 120 and may be connected to Engine 1 . . . Engine n 135. Each CP 1 . . . CP n 130 may be associated with Engine 1 . . . Engine n 135 and each pair is an engine block (EB) EB 1 . . . EB n 137. In another embodiment, the CP 1 . . . CP n 130 may be a single command processor. In general, the CP 1 . . . CP n 130 receives instructions to be executed from the CPU 105, and may coordinate the execution of those instructions on Engine 1 . . . Engine n 135 in GPU 120. In some instances, the CP 1 . . . CP n 130 may generate one or more commands to be executed in GPU 120, that correspond to each command received from CPU 105. Logic instructions implementing the functionality of the CP 1 . . . CP n 130 may be implemented in hardware, firmware, or software, or a combination thereof.
The memory 115 may include a one or more memory devices and may be a dynamic random access memory (DRAM) or a similar memory device used for non-persistent storage of data. The memory 115 may include a timestamp memory 1-n 160 (corresponding to driver(s)) and indirect buffers 155. During execution, memory 115 may have residing within it, one or more memory buffers 145 through which CPU 105 communicates commands to GPU 120.
The memory buffers 145 may correspond to the graphics engines 135 or the engine blocks 137, as appropriate. Memory buffers 145 may be ring buffers or other data structures suitable for efficient queuing of work items or command packets. In the instance of a ring buffer, command packets may be placed into and taken away from the memory buffers 145 in a circular manner. For purposes of illustration, memory buffers 145 may be referred to as ring buffers 145 herein.
The indirect buffers 155 may be used to hold the actual commands, (e.g., instructions and data). For example, when CPU 105 communicates a command packet to the GPU 120, the command packet may be stored in an indirect buffer 155 and a pointer to that indirect buffer 155 may be inserted in a ring buffer 145. As described herein below with respect to FIG. 2, the CPU 105, via driver 110, as writer of the commands to ring buffers 145 and GPU 120 as a reader of such commands may coordinate a write pointer and read pointer indicating the last item added, and last item read, respectively, in ring buffers 145.
An operation, for example a drawing operation, may require multiple resources. These resources may be associated with more than one operation or graphics engine. When executing such an operation, there are several solutions for buffering the requests for the resources.
When a processor becomes backlogged with the requests, it can store the requests for later execution—or even later overwrite, in a buffer, or more particularly a ring buffer. One advantage of a ring buffer is that it does not need to have its command packets shuffled around when one is consumed. This contrasts with non-ring buffers, where it is necessary to shift all packets when one is consumed. Said another way, the ring buffer is well-suited as a FIFO buffer while a standard, non-ring buffer is well-suited as a LIFO buffer.
Another memory management tool is the semaphore, which controls access to a common resource. It does this by acting as the gatekeeper to the resource, and noting how much of the resource is free after each processor accesses the resource (or frees up a resource when done). If the resource is free, the semaphore permits the next process to access the resource. If not, the semaphore directs the process to wait.
These memory management tools create long wait times if the resource is fully used, and the memory and thread use in the ring buffer may also take up resources. This wait time and memory usage may create performance issues for multiple engines that share the resources.