Computer systems are commonly used for displaying graphical objects on a display screen. The purpose of three dimensional (3-D) computer graphics is generally to create two-dimensional (2-D) images on a computer screen that realistically represent an object or objects in three dimensions. In the real world, objects occupy three dimensions having a real height, a real width and a real depth. A photograph is an example of a 2-D representation of a 3-D space. 3-D computer graphics are generally like a photograph in that they represent a 3-D world on the 2-D space of a computer screen, except the underlying image is generally modeled with 3-D geometry and surface textures.
Images created with 3-D computer graphics are used in a wide range of applications, from video entertainment games to aircraft flight simulators, to portray in a realistic manner an individual's view of a scene at a given point in time. Well-known examples of 3-D computer graphics include special effects in Hollywood films such as Terminator II, Jurassic Park, Toy Story and the like.
One industry that has seen a particularly tremendous amount of growth in the last few years is the computer game industry and the current generation of computer games apply 3-D graphics techniques in an ever increasing fashion. At the same time, the speed of play is driven faster and faster. This combination has fueled a genuine need for rapid and flexible rendering of 3-D graphics in relatively inexpensive systems.
To create a 3-D computer graphical representation, typically, the objects to be depicted are represented as mathematical models within the computer. For instance, 3-D models can be made up of geometric points within a coordinate system consisting of an x, y and z axis, for example, corresponding to width, height, and depth, respectively. Objects are defined by a series of points, called vertices. The location of a point, or vertex, is defined by its x, y and z coordinates (or other coordinate system). In graphics terminology, one vertex is a point, two vertices define a line, or a line segment, and three vertices define a triangle where all three are “primitives.” When three or more of these points are connected, a polygon is formed, with the triangle being the simplest polygon.
Rendering and displaying three dimensional (3-D) graphics on screen typically involves many calculations and computations. In a simple graphics system, such computations occur according to some level of cooperative or shared processing by the central processing unit (CPU) and the graphics processing unit (GPU). In an exemplary scenario, after instructions are processed and some initial computations occur in the CPU, a set of coordinate points or vertices that define the object to be rendered are stored in video memory for further processing by the GPU in the graphics pipeline. A tessellator may break the graphics data down into simple polygons according to predetermined algorithms designed to efficiently cover the surface of the object being represented—this is known as tessellation. Currently, in most graphics pipelines, the data may then be operated upon by one or more procedural shaders, depending upon the instructions that are delivered to the GPU.
Procedural shaders are specialized processing subunits of the GPU for performing specialized operations on graphics data. An example of a procedural shader is a vertex shader, which generally operates on vertices. For instance, a vertex shader can apply computations of positions, colors and texturing coordinates to individual vertices. Vertex shaders perform either fixed or programmable function computations on streams of vertices specified in video memory of the graphics pipeline. Vertex shaders cannot generate additional geometry (vertices), but rather operate on the vertices that are specified for algorithmic transformation via the program downloaded to the vertex shader from the host.
Generally staged after the vertex shader, another example of a procedural shader is a pixel shader. For instance, the outputs of the vertex shader can be passed through a non-programmable unit, called the setup engine, which is a non-procedural shader unit that defines the gamut of pixels on which the pixel shader operates. The gamut of pixels can then be operated on by a pixel shader, which in turn operates on each individual pixel. From the pixel shader, the output values are sent to the frame buffer where they wait to be displayed to the screen, or await retrieval by the host system. While 3-D geometry is generally meshed together with triangles (sets of 3 vertices), currently, there are no procedural shaders that operate directly upon triangles as such (sets of three related vertices), or line segments, or lines, either (sets of two related vertices).
Where programmable, it is noted that the term “shader” is subject to misleading use because a “shader” can refer to the hardware subunit of the GPU that performs the shading and a “shader” can refer to the set of instructions or tokens downloaded to the GPU that are subsequently loaded into memory, e.g., register storage, used by the shader (hardware) to perform the shading. The term can also refer to both working together. Thus, while sometimes the word is used herein generically to refer to either, and accordingly, should be taken to encompass both meanings, where the term “subunit” is also used in connection with the term shader, the terms should be construed to refer to the subunit of the GPU that performs the processing associated with the shading.
It is also noted that the term “frame buffer” in today's graphics architectures does not generally refer to a pre-partitioned, pre-allocated and pre-defined portion of video memory reserved solely for the output of the setup engine or pixel processing unit of the graphics processing unit, but rather the frame buffer generally refers to any memory (generally video memory included for interoperation with the GPU) used in connection with rasterization and/or digital to analog converter (DAC)-out processes. In this regard, though the term rasterization is sometimes used more generally, the processing performed in connection with pixel processing or setup engine processing in the graphics pipeline is generally referred to as rasterization. Scan-out or DAC-out, on the other hand, is the process of transmitting signals to a monitor or LCD based on the contents of the frame buffer.
Today, the values in the frame buffer may be used for purposes other than for display to a screen, but such use is limited. For instance, it may be desired to return the values to an application on the host after they have been operated on by a pixel shader or vertex shader for saving, printing, further transformation, etc. In these instances, the values “technically” may be read; however, retrieving the values in video memory, other than the limited portions devoted to buffering images for display, is an extremely arduous task that requires cooperation with host memory, the CPU and advanced knowledge about how video memory is arranged for use in connection with the particular graphics hardware. This is especially true if any portion of the video memory is to be read other than the frame buffer. While not impossible, in practice, it is far from straightforward to read the intermediate storage of video memory used for output by the tessellator, vertex shaders and pixel shaders. In some cases, the outputs of tessellators, vertex shaders and pixel shaders may never reach video memory, but instead may be limited to a specialized data path for use between GPU graphics processing stages (e.g., the output from a vertex shader is generally transmitted to a pixel shader via a pre-defined path without reaching video memory). GPU operations occur so fast and in accordance with specialized architecture that reading intermediate video memory storage in current graphics pipelines is not feasible until the output reaches the frame buffer for rasterization or other operation.
While the frame buffer memory, which receives the output of the pixel shader in a conventional system, may be accessed in a known fashion by the host CPU to save (or otherwise operate on) graphics data, such as images, processed by the GPU, the data must still be accessed and retrieved back to the host CPU for such further operations. FIG. 1B illustrates what must be done to the output of data from a frame buffer memory to achieve further operation upon the data by the graphics pipeline. After operation of a cycle of pipeline processing as represented by the GPU operations and after retrieving the values from the frame buffer back to the host CPU, the CPU then cooperates to send the values back through the graphics pipeline for any further processing. Thus, what is desired in the state of the art is two main things: (1) the ability to re-use, i.e., re-process, data that is output from sub-components, such as a vertex shader or a pixel shader, inside the graphics pipeline prior to reaching frame buffer memory and (2) the ability to do so recursively without implicating host resources and memory.
As illustrated in FIG. 1A, computing systems are divided between the host CPU and the graphics hardware. The CPU facilitates the making of calls to graphics APIs by applications and services requesting their use. Conventionally, the application and drivers are located on the CPU side and information from those sources is sent to be displayed on a monitor. First, the information is sent from the CPU to the GPU, as packaged by the CPU according to APIs. Then, the information from the application waits in memory until it is accessed by the vertex shader. After the vertex shader concludes its operations, the information, as output from the vertex shader, is output through a special data path to the pixel shader (placing the output in video memory is generally too slow) until it is accessed by the pixel shader, and the vertex shader sits idle or continues processing more data. After the pixel shader has performed its operations, the information is placed in a frame buffer to be scanned out to a display, or sent back to the host for further operation.
Specialized 3-D graphics APIs have been developed that expose the specialized functionality of today's vertex and pixel shaders. In this regard, a developer is able to download instructions, or small programs, to a vertex shader unit that effectively program the vertex shader to perform specialized behavior. For instance, APIs expose functionality associated with increased numbers of registers in vertex shaders, e.g., specialized vertex shading functionality with respect to floating point numbers at a register level. In addition, it is possible to implement an instruction set that causes the extremely fast vertex shader to return only the fractional portion of floating point numbers. A variety of functionality can be achieved through downloading these instructions, assuming the instruction count limit of the vertex shader and associated the register storage is not exceeded.
Most notably, the functionality of a vertex shader stage is for transformation, lighting and occasionally texture transformation. Basically, transformation is taking position data, the data as to where vertices should be when displayed, and transforming it to data for the monitor, a two dimensional screen space. Traditionally, vertex transformation processes either pass position data without modification or modify the data using matrices. The vertex shader stage is usually limited to performing transformation functions, lighting functions, and some texture functions.
As games increase the level of graphics detail, polygon density increases and lighting and other vertex shading techniques become more important as a vertex processing step. Static lighting, a popular form of lighting due to its high quality, is usually calculated in the vertex shader and stored as a texture. Because it places all the lighting information into textures, it is difficult to modify the information during runtime making dynamic lighting possible only if instructions per vertex are given beforehand. Occasionally, the vertex shader applies a matrix transform to vertex coordinates to texture coordinates. This usually occurs for spherical and cubical reflection mapping and texture animation.
The typical types of lighting carried out by a vertex shader include: positional, directional or spotlight. To add such lighting, the mathematical computations, mostly matrix manipulation, change the vertices to reflect a type of lighting defined in an application. There are different lights, typically mimicking reality where light sources like sunlight and a street light have different properties. These lights can be positional, directional or spotlights creating a multitude of combinations for the vertex shader stage to compute.
Thus, the geometry processing performed using a vertex shader includes some type of transformation to apply to the data, lighting environment for vertices, and material for the texture transformation. Both fixed function and programmable vertex shaders usually function in those ways during that stage in the pipeline. If an application has more information to be processed in these areas, there will be a bottleneck in the vertex shader and other components of the graphics pipeline will sit idle.
With respect to pixel shaders, specialized pixel shading functionality can be achieved by downloading instructions to the pixel shader. For instance, functionality is exposed that provides a linear interpolation mechanism in the pixel shader. Furthermore, the functionality of many different operation modifiers are exposed to developers in connection with instruction sets tailored to pixel shaders. For example, negating, remapping, biasing, and other functionality are extremely useful for many graphics applications for which efficient pixel shading is desirable, yet as they are executed as part of a single instruction they are best expressed as modifiers to that instruction. In short, the above functionality is advantageous for graphics operations, and their functional incorporation into already specialized pixel and vertex shader sets of instructions adds tremendous value from the perspective of ease of development and improved performance. A variety of functionality can thus be achieved through downloading these instructions, assuming the instruction count limit and other hardware limitations of the pixel shader are not exceeded.
Although the pixel shader does perform some matrix operations (e.g., vector transformations for lighting), it may be useful to think of the functionality of a pixel shader as having more or less straightforward mathematical computational capabilities, as opposed to the more complex matrix calculations performed by the vertex shader to transform vertices, wherein each vertex may be represented by many values that require processing. The math involved with a pixel shader is likely floating point calculations and these calculations shade pixels to create reflectivity, glossiness, and bumpiness. There are limitations to the number of calculations that can be performed as information is passed through the pixel shader. Because of these limitations, some information requires multi-pass operations to create multiple textures on each pixel. And each pass encompasses several clock cycles in the pipeline.
The vertex shader computes matrices on a given vertex while the pixel shader can compute most floating point operations. There are instances when a programmer may want to see the mathematical transformations of the vertex and not the values as a screen display. At this point a programmer would have to read memory from the frame buffer which, as discussed, is prohibitive.
Pixel shaders and vertex shaders are able to operate on pixels and vertices. In graphics programming, primitives are the basic shapes out of which any given three dimensional graphic is created. Regularly used and defined primitives include a vertex, a line, and a triangle. Today, pixel shaders and vertex shaders are able to operate with given instructions on a vertex. Even if the shaders are programmable, the shaders are only able to operate on vertices, or points, as primitives.
When these specific shaders, either the pixel or vertex shaders, operate in a graphics pipeline, there are regular bottlenecks due to the operations that occur in each stage. These bottlenecks can be solved if a programmer tries to limit and balance the instructions sent to each shader for every vertex. However, when designing graphics for an intricate 3-D display, like a game, balancing the number of instructions becomes an overwhelming task. Most programmers do not implement any sort of optimization until the graphics are noticeably slower in a given program. Even in the best optimization schemes that a programmer may use, hardware will be found sitting idle and unused waiting for information to pass through, simply because of the variance associated with different kinds of tasks requested of a graphics subsystem. Furthermore, in order to optimize any graphics program, a programmer first must identify the source of the bottleneck which can be a tedious task. Thus, it would be desirable to be able to dynamically reconfigure the cores of a graphics subsystem so that the core processing of the data is automatically tailored to the task being requested to be performed.
Thus, the rendering of graphics data in a computer system is a collection of resource intensive processes. The process of shading, i.e., the process of performing complex algorithms upon set(s) of specialized graphics data structures, used to determine values for certain primitives, such as color, etc. associated with the graphics data structures, exemplifies such a computation intensive and complex process. Generally the process of shading has been normalized to some degree. By passing source code designed to work with a shader into an application, a shader becomes an object that the application may create/utilize in order to facilitate the efficient drawing of complex video graphics. Vertex shaders and pixel shaders are examples of such shaders.
Thus, the introduction of programmable operations on a per vertex and per pixel basis has become more wide spread in modern graphics hardware. This general programmability enables the potential for limited creative algorithms at increased performance levels. However, in addition to those noted above, there are some limitations to what can be achieved today. Typically, with present day rendering pipelines at the vertex and pixel shaders, a stream of geometry data is input to the vertex shader to perform some operation of the vertices, as a result of which the geometry data is transformed to pixel data, outputting a stream of pixel data. The vertex shader may receive instructions which program the vertex shader to perform specialized functionality, but there are limits to the size and complexity of the vertex shader instructions. Similarly, a pixel shader can optionally perform one or more transformations to the data outputting a stream of pixel data. The pixel shader may also receive instructions which program the pixel shader to perform specialized functionality, but there are limits to the size and complexity to the pixel shader instructions.
Today, programmers regularly use vertex shaders and pixel shaders. The current programmable hardware has somewhat limited programmable mechanisms, however, that do not allow a programmer to specify the re-use of values prior to reaching the end of the pipeline before the values are rasterized. Programmers may try to balance their use of different components in the graphics pipeline to avoid bottlenecks in one or more of the shaders, however, the graphics hardware is fixed. While some specialized hardware is built for specialized tasks having specialized combinations and arrangements of procedural shaders for those specialized tasks, the hardware cannot be rearranged for other tasks. Thus, when performing tasks other than those specialized tasks, and information needs to be modified specifically by instructions for only one of the shaders, the modification may have to wait, perhaps unacceptably so depending upon the application. Furthermore, while vertices can be operated upon by shaders, however, the programmer is unable to specify via graphics APIs operations that work directly on other primitives as such. In other words, with the exception of vertices, a programmer is unable to arbitrarily package primitives for processing in the graphics pipeline as primitives.
Tesselation is a process that typically takes place at the beginning of a graphics pipeline which involves covering of a bounded geometric region without gaps or overlaps by congruent plane figures of one type or a few types. While existing tesselators implement a few basic algorithms for creating a grid of new vertices based on a handful of control-point vertices, the process is based on pre-fixed algorithms and is limited to the front end of the pipeline, and is therefore not programmable to create additional arbitrary geometry in the middle of the pipeline after processing by a procedural shader. Moreover, once vertices are generated by the tesselator, there is nowhere else in the pipeline where vertices can be generated. In other words, today's vertex shaders may be able to receive, process and output vertices, but they are unable to generate new vertices.
It would thus be desirable to implement systems and methods that overcome the shortcomings of present programmability in connection with present graphics pipelines architectures, APIs and hardware due to limitations in instruction count, limitations in form of output and the lack of sharing of data in the pipeline.