A graphics processing unit (GPU) is a specialized electronic circuit designed to rapidly manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display. GPUs typically include a command streamer to read in high-level commands, manage a three-dimensional (3D) pipeline and orchestrate the different stages (e.g., vertex fetch, vertex shading, geometry shading, clip/setup, pixel shading and raster output) across multiple programmable GPU engines and fixed function hardware. Thus, the command streamer is responsible for fetching and executing commands for a particular engine in the GPU.
Typically, a driver application breaks up commands into batch buffers that are linked via pointers. A command streamer subsequently parses the batch buffers and transmits commands to a graphics pipeline. Most command streamers pre-fetch commands into a FIFO, which the main parser consumes in order to avoid stalling the pipeline. However, a simple linear pre-fetch of commands will incur performance penalty whenever it encounters a linked buffer. Moreover, various graphics applications generate bundles and smaller batch buffers. Current command streamer design delays by a full memory latency between any arbitration between ring and batch buffers. Thus, the combination of memory latency and small batch buffers causes a performance bottleneck at a parser included in the command streamer. A current solution is to use a driver to combine the buffers into bigger buffers. However, this solution requires additional CPU cycles, which may reduce power and performance based on the workload.