Field of the Invention
The present invention relates to processing units and, in particular, to hardware for parallel command list generation.
Description of the Related Art
Microsoft® Direct3D 11 (DX11) is an API (Application Programming Interface) that supports tessellation and allows for improved multi-threading to assist developers in developing applications that better utilize multi-core processors.
In DX11, each core of a CPU (central processing unit) can execute threads of commands in parallel. Each core, or different threads on the same core, generates a separate command list via its own copy of a user-mode driver to increase performance of the software application. A command list is an API-level abstraction of a command buffer, which is a lower-level concept. The driver builds up a command buffer as it receives API commands from the application; a command list is manifested by a completed command buffer plus any additional implementation-defined meta information. The contents of a command list or command buffer are typically executed by a GPU (graphics processing unit). There is a single thread running on one of the CPU cores that submits command lists for execution in a particular order. The order of the command lists, and therefore the order of the command buffers, is determined by the application program. Command buffers are fed into the core via pushbuffers. The command buffers are composed of methods to be executed by the core, typically a GPU.
However, DX11 does not allow processor state inheritance across command lists. Instead, the processor state is reset at the beginning of every command list to a so-called “clean slate state.” That means that each user-mode driver thread sets all the state parameters in the processor at the beginning of the command list. Not providing state inheritance across command lists provides a significant drawback since threads cannot cooperate when executing the application program. Moreover, the added processing cost of resetting the processor state to the clean slate state using dozens or hundreds of commands adds inefficiencies to the system, thereby reducing overall performance.
As the foregoing illustrates, there is a need in the art for an improved technique that addresses the limitations of current approaches set forth above.