1. Field of the Invention
The invention is in the field of data-processing and more specifically in the field of data-processing pipelines relating to graphics processing units.
2. Description of the Background
Current data-processing methods are exemplified by systems and methods developed for computer graphics. This specialized field processes data using a data-processing pipeline that performs a specific sequence of operations on the data and optionally sends its output for further processing. Efficiency is achieved by using dedicated hardware components, each configured to execute specific sequences of operations.
In computer graphics, multi-pass methods generally create image data, in a first pass, that are then used as input in a subsequent pass. For example, in the first pass an image may be rendered by a graphics data-processing pipeline and stored in a frame buffer as image data. The image data is then used, for example, as a texture map, in a subsequent pass, to generate new image data, which can then be used in another pass through the pipeline, producing new data in a frame buffer, and so on and so forth. The end result of the multi-pass process is a final image in a frame buffer for optional display to a user.
Reflection mapping is an example of a multi-pass process of the prior art. In a first pass through a graphics data-processing pipeline, an image is rendered using a viewpoint located at a position occupied by a reflective object in a scene. The rendering results in an intermediate red-green-blue (RGB) image that is stored in a frame buffer. In the second pass, the RGB image generated in the first pass is used as a reflection map, a particular type of texture map. In the second pass, the scene is rendered, and surface normals (normal vectors) of the reflective object, along with vectors from the viewpoint to each point on the reflective surface, are used to compute texture coordinates to index the reflection map to the surface of the reflective object. Hence, this example includes two passes, a first pass to generate a reflection map by rendering an image from a first vantage point; and a second pass to render the scene to produce a final image, using the reflection map to color (texture) the reflective object.
Shadow mapping is another multi-pass method of the prior art. In shadow mapping a depth-only image is first rendered from the vantage point of each light. The resulting image data is then used while rendering an entire scene from the view point of an observer. During the rendering of the scene, the depth-only images are conditionally used to include corresponding lights when computing a color value, including lighting, for each pixel or pixel fragment.
FIG. 1 is a block diagram illustrating a prior art General Computing System generally designated 100 and including a Host Computer 110 coupled through a bus disposed on a motherboard of Host Computer 110, such as an External Bus 115 to a Graphics Subsystem 120. Though a direct memory access (DMA) connection between Host Processor 114 and Interface 117 is illustratively shown, Graphics Subsystem 120 may be connected to Host Memory 112 via an input/output (I/O) hub or controller (not shown) as is known. Host Computer 110 is, for example, a personal computer, server, or computer game system, including a Host Processor 114. A Host Memory 112, included in Host Computer 110, is configured to store geometric data representative of one, two, three, or higher-dimensional objects. For example, host memory 112 may store x, y, z data representing locations of surface points in “object space.” These x, y, z data are often associated with u, v data relating each surface point to a color or texture map. Host memory 112 may also include information relating the relative positions of objects and a viewpoint in “world space.” In some instances Host Computer 110 is configured to tessellate the x, y, z, data to generate a vertex-based representation of primitives that represent a surface to be rendered.
Graphics Subsystem 120 receives data from Host Memory 112 through an Interface 117. The bandwidth of Interface 117 is limited by External Bus 115, which is typically a peripheral bus, e.g., accelerated graphics port (AGP) or peripheral component interface (PCI) coupled to Host Memory 112 of Host Computer 110. A Memory Controller 130 manages requests, initiated by hardware components of Graphics Subsystem 120, to read from or write to a Local Memory 135. Communication between Interface 117 and Memory Controller 130 is through an Internal Bus 145. Geometry Processor 140 is specifically designed to operate on the types of data received from Host Computer 110. For example, Memory Controller 130 receives vertex data via Interface 117 and writes this data to Local Memory 135. Subsequently, Memory Controller 130 receives a request from the Geometry Processor 140 to fetch data and transfers data read from Local Memory 135 to Geometry Processor 140. Alternatively, Geometry Processor 140 may receive data directly from Interface 117. In some prior art graphics subsystems (not shown), a DMA processor or command processor receives or reads data from Host Memory 112 or Local Memory 135, and in some prior art graphics subsystems (not shown) Graphics Subsystem 120 is integrated into an I/O hub or I/O controller, where graphics memory is shared with Host Memory 112 though some Local Memory 135 may be provided.
Geometry Processor 140 is configured to transform vertex data from an object-based coordinate representation (object space) to an alternatively based coordinate system such as world space or normalized device coordinates (NDC) space. Geometry Processor 140 also performs “setup” processes in which parameters, such as deltas and slopes, required to rasterize the vertex data are calculated. In some instances Geometry Processor 140 may receive higher-order surface data and tessellate this data to generate the vertex data.
The transformed vertex data is passed from Geometry Processor 140 to Rasterizer 150 wherein each planar primitive (e.g. triangle or quadrilateral) is rasterized to a list of axis-aligned and uniformly distributed grid elements (i.e. discretized) that cover the primitive. The grid elements are usually in NDC space and map onto a region of an array of pixels that represent the complete image to be rendered. Each element of the array covered by a grid element of the primitive is a fragment of the corresponding surface and is therefore referred to as fragment data. Each fragment data element includes associated data characterizing the surface (e.g. position in NDC, colors, texture coordinates, etc.).
An output of Rasterizer 150 is passed to a Texturer 155 and to a Shader 160 wherein the fragment data is modified. In one approach, modification is accomplished using a predefined lookup table stored in Local Memory 135. The lookup table may include several texture or shading maps accessed using texture coordinates as indices. An output of Shader 160 is processed using Raster Operation Unit 165, which receives the fragment data from Shader 160 and, if required, reads corresponding pixel data such as color and depth (z) in the current view for additional processing. Raster Operation Unit 165 is also configured to read stencil data for use in performing the above operations.
After performing the pixel operations involving color and z, Raster Operation Unit 165 writes the modified fragment data into Local Memory 135 through Memory Controller 130. The modified fragment data, written to Local Memory 135, is new or initial pixel data with respect to a first pass. The pixel data is stored subject to modification by one or more subsequent fragment data written to the same pixel (memory) location or delivery to a Display 175 via Scanout 180. Alternatively, pixel data within Local Memory 135 may be read, through Memory Controller 130, out through Interface 117. Using this approach, data in Local Memory 135 may be transferred back to Host Memory 112 for further manipulation. However, this transfer occurs through External Bus 115 and is therefore slow relative to data transfers within Graphics Subsystem 120. In some instances of the prior art, pixel data generated by Raster Operation Unit 165 may be read from Local Memory 135 back into Raster Operation Unit 165 or Texturer 155. However, in the prior art, data generated in the graphics data-processing pipeline (i.e. Geometry 140, Rasterizer 150, Texturer 155, Shader 160, and Raster Operation Unit 165) as output from Raster Operation Unit 165 was not accessible to Geometry Processor 140 without first being converted into a compatible format by Host Computer 110. Likewise, stencil data may not be read by Geometry Processor 140, Rasterizer 150, Texturer 155 or Shader 160, and once generated is only modified by Raster Operation Unit 165 or Host Computer 110.
FIG. 2 is a flow chart illustrating a prior art method of image rendering using the General Computing System 100 of FIG. 1. In a Receive Geometry Data Step 210 data is transferred from Host Memory 112 through Interface 117 to either Local Memory 135, under the control of Memory Controller 130, or directly to Geometry Processor 140. This transfer occurs through External Bus 115, which, in comparison to data busses within Graphics Subsystem 120, has lower bandwidth. In a Process Geometric Data Step 220, performed using Geometry Processor 140, surfaces within the transferred data are tessellated, if needed, to generate vertex data and then transformed. After transformation, primitive “setup” for rasterization is performed.
In a Rasterize Step 230 performed using Rasterizer 150 fragment data is generated from vertex-based data.
In a Process Fragments Step 240 the fragment data is textured and shaded using Texturer 155 and Shader 160. In a typical implementation, per-vertex colors and texture coordinates (among other per-vertex attributes) are bilinearly interpolated per fragment across the primitive to compute color and z (depth) values that are output to Raster Operation Unit 165.
In a Store Pixel Data Step 250 Raster Operation Unit 165 is used to map the fragment produced in the previous step onto a pixel in Local Memory 135, optionally operating on previously-stored data at that pixel location, and, finally, depending on the result of available tests (e.g., depth test, alpha test, stencil test) in the Raster Operation Unit 165, conditionally storing the fragment data into its corresponding pixel location in Local Memory 135. These available tests include retrieval and use of stencil data stored in Local Memory 135. Storage occurs by writing data through Internal Bus 170. The color data generated by Raster Operation Unit 165 is typically limited to match color depth of supported displays. Data from Local Memory 135 are transferred to a display device in a Display Step 260.
FIG. 3 is a flow chart illustrating an advanced method of image rendering known as reflection mapping. In this method pixel data is first rendered for a first scene and viewpoint. A scene consists of one or more objects. The first image is then used as a texture map for shading one or more objects in a second viewpoint of the scene. The final image shows a reflection of the first scene on the surface of the object in the second viewpoint of the scene. As shown in FIG. 3, steps 210 through 250 are performed in a manner similar to that described in relation to FIG. 2. In Store Pixel Data Step 250 the first scene pixel data is stored in a region of Local Memory 135 that can be read by Texturer 155. Instead of immediately being used in Display Step 260, the data stored in Store Pixel Data Step 250 is used in a second pass through the graphics data-processing pipeline. The second pass starts with a Receive Geometry Data Step 310 wherein geometry data representing an object in the second viewpoint of the scene is received from Host Computer 110. This data is processed using Geometry Processor 140 in Process Geometric Data Step 320 and transformed into second fragment data in a Rasterize Step 330.
In a Process Fragments Step 340, the second fragment data is shaded using the first pixel data stored in Store Pixel Data Step 260. This shading results in an image of the first scene on the surface of an object in the second viewpoint of the scene. The shaded second pixel data is stored in a Store Pixel Data Step 350 and optionally displayed in a Display Step 260.
Due to the specified dynamic range and precision of the numerical values (i.e. the formats) associated with data busses or graphics data processing elements within Graphics Subsystem 120, heretofore data had to be converted, if feasible, by Host Processor 114 of Host Computers 110 to facilitate date re-use. The limited processing capabilities of Raster Operation Unit 165 prevent significant processing of the stencil data at this point in Graphics Subsystem 120 and other graphics data processing elements within Graphics Subsystem 120 may not be configured to read stencil data. Heretofore, for Texturer 155 or Geometry Processor 140 to process stencil data written by Raster Operation Unit 165, Host Processor 114 read and formatted such data to produce data formatted for input to Texturer 155 or Geometry Processor 140 consuming performance dependant bandwidth between Host Computer 110 and Graphics Subsystem 120. Therefore, it would be desirable and useful to increase flexibility for data re-use by a graphics subsystem. Additionally, it would be desirable and useful to improve system level performance by providing data re-use with less dependence on such performance dependent bandwidth.