Computing techniques been developed to allow general purpose operations to be performed in a GPU (graphics processing unit). A GPU has a large number of simple parallel processing pipelines that are optimized for graphics processing. By moving general purpose operations that require many similar or identical parallel calculations to the GPU, these operations can be performed more quickly than on the CPU (Central Processing Unit) while processing demands on the CPU are reduced. This can reduce power consumption while improving performance.
The GPU has several different processing engines that are optimized to perform different functions. These engines may include: a Blitter Engine, a Render Engine, a Video Decode Engine, a Video Encode Engine, and a Video Enhancement Engine among others. Each engine process commands within a context that is scheduled by a separate scheduling processor. The scheduling processor assigns contexts to each engine and manages the execution of command streams associated with each context.
However, the processing engines, command buffers and command streamers of GPUs must coordinate the transfer of intermediate values and commands between the different engines. When one engine is producing a value that will be consumed in commands executed by another engine, some mechanism must be used to ensure that the value is ready for the consumer to use. The coordination between the engines can consume significant resources that cannot then be used to execute the commands.