Current processing systems have multiple processing cores to provide parallel processing of computational tasks, which increase the speed of completing such tasks. For example specialized processing chips such as graphic processing units (GPU) have been employed to perform complex operations such as rendering graphics. A GPU is understood as a specialized processing circuit designed to rapidly manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display. GPUs may include hundreds if not thousands of processing cores since graphic processing may be massively parallelized to speed rendering of graphics in real-time. GPUs perform various graphic processing functions by performing calculations related to 3D graphics. These include accelerating memory-intensive work such as texture mapping and rendering polygons, performing geometric calculations such as the rotation and translation of vertices into different coordinate systems. GPUs may also support programmable shader programs, which can manipulate vertices and textures, oversampling and interpolation techniques to reduce aliasing, and very high-precision color spaces.
Most graphics systems are built on top of an application program interface (API), which provides an abstracted way of running GPU programs independent of a particular hardware and operating system. APIs do not provide, however, an efficient method for binding data between the CPU and GPU, or a management system for managing GPU programs. There is also variance to the degree of which an API might handle hazards and state information, that is, situations where one operation must wait until the completion of another operation.
GPUs typically operate by performing computational tasks on a series of commands placed in a command queue. An API provides a method of indirect access to filling the command queue. A command thus might consist of binding a particular resource (an image or a memory buffer) to specific internal bind point, binding a shader program for execution, or submitting a set of triangles for rasterization.
A GPU processes a command queue serially, meaning that the commands are issued in the order they are placed in the command queue. A GPU processes these commands asynchronously from the CPU, and all synchronization must be handled by the API or the application itself. Because the GPU often accepts data from the CPU, the CPU must avoid touching the memory that is in use by the GPU. This condition may be referred to as a CPU to GPU hazard. Additionally, the GPU itself is intrinsically parallel, and may execute certain commands in parallel without waiting for one operation to be complete before issuing the next one. For example, a GPU may write into an image in a command, then the next command may attempt to use this image to render an object into the screen. If not properly fenced, the second command may begin executing before the image has completed writing, thereby causing incorrect results. This situation is referenced as a GPU hazard.
The GPU is a complex device which has a significant amount of state related to executed commands. GPUs maintain an internal state which may or may not be visible to a CPU. For example, if a CPU program issues a command to bind a resource in one command, then all other commands in the queue which were placed after that command expect that this resource is bound until such a point that a new command either unbinds it or binds a different resource. Usually, the concept that each GPU command will inherit state from the previous command is directly exposed in the API.
A command buffer with a command queue is typically generated by a program operating on one or more CPUs. The state-aware nature of the API and asynchronous nature of the command queue provide difficulties in allowing multiple CPUs to add commands to the GPU. This is because state aware APIs and command queues require at least some knowledge of previous commands, which is intrinsically serial. Thus, it is difficult for two commands to process simultaneously since a preceding command must process before following commands can specify the required state changes.
Additionally, due to different architectural origins, there is often a mismatch between CPU data types and GPU data types which can introduce a significant overhead during the generation of commands.
Thus, there is a need for a command system that allows CPU and GPUs to efficiently execute program instructions. There is a further need for a meta language that allows data types to be shared efficiently between a CPU and a GPU. There is also a need for a command format that may be used independent of the processor hardware type.