1. Field of the Invention
The present invention is generally directed to computing operations performed in computing systems. More particularly, the present invention is directed to graphics processing tasks performed in computing systems.
2. Background Art
A graphics processing unit (GPU) is a complex integrated circuit that is specially designed to perform graphics processing tasks. For example, a GPU can execute graphics processing tasks required by an end-user application, such as a video game application. The computational capabilities for GPUs have grown at a rate exceeding that of the corresponding central processing unit (CPU) platforms. This growth, coupled with the explosion of the mobile computing market (e.g., notebooks, mobile smart phones, tablets, etc.) and its necessary supporting server/enterprise systems, has been used to provide a specified quality of desired user experience.
However, GPUs have traditionally operated in a constrained programming environment, available primarily for the acceleration of graphics. These constraints arose from the fact that GPUs did not have as rich a programming ecosystem as CPUs. Their use, therefore, has been mostly limited to two dimensional (2D) and three dimensional (3D) graphics and a few leading edge multimedia applications, which are already accustomed to dealing with graphics and video application programming interfaces (APIs).
With the advent of multi-vendor supported OpenCL® and DirectCompute®, standard APIs and supporting tools, the limitations of the GPUs in traditional applications has been extended beyond traditional graphics. Although OpenCL and DirectCompute are a promising start, there are many hurdles remaining to creating an environment and ecosystem that allows the combination of a CPU and a GPU to be used as fluidly as the CPU for most programming tasks.
In general, there are several layers of software between an end-user application and the GPU. The end-user application communicates with an application programming interface (API). An API allows the end-user application to output graphics data and commands in a standardized format, rather than in a format that is dependent on the GPU. The API communicates with a driver. The driver translates standard code received from the API into a native format of instructions understood by the GPU. The driver is typically written by the manufacturer of the GPU. The GPU then executes the instructions from the driver.
A standard GPU creates the pixels that make up an image from a higher level description of its components in a process known as rendering. GPUs typically utilize a concept of continuous rendering by the use of pipelines to process pixel, texture, and geometric data. These pipelines are often referred to as a collection of fixed function special purpose pipelines such as rasterizers, setup engines, color blenders, texture mapping and programmable stages that can be accomplished in shader pipes or shader pipelines. “Shader” is a term in computer graphics referring to a set of software instructions used by a graphic resource primarily to perform rendering effects. In addition, GPUs can also employ multiple programmable pipelines in a parallel processing design to obtain higher throughput. Multiple shader pipelines can also be referred to as a shader pipe array.
In addition, GPUs also support a concept known as texture mapping. Texture mapping is a process used to determine the texture color for a texture mapped pixel through the use of the colors of nearby pixels of the texture, or texels. The process is also referred to as texture smoothing or texture interpolation. However, high image quality texture mapping requires a high degree of computational complexity. Furthermore, GPUs equipped with a single (unified) shader also simultaneously support many types of shader processing. Thus raising the demand for higher performance generalized memory access capabilities.
The shader engines rely on high speed access to local cache memory for texture, shader code, and other types of data. Preloading a cache with data reduces the execution times of GPU operations due to the lack of need to access a video or main system memory for the data, which can be time intensive. This results in improved GPU performance when the same or similar portions of memory are accessed, each time a GPU begins execution. Currently, the GPU does not have a dedicated programmable controller, which provides the functionality of preloading a cache with data.
Given the ever increasing complexity of new software applications, the demands on GPUs to provide efficient and high quality rendering, texture filtering and error correction are also increasing.