1. Field of the Invention
One or more aspects in accordance with the invention generally relate to data processing, and more specifically, to multi-pass data-processing pipelines relating to graphics processing.
2. Description of the Related Art
Conventional, multi-pass data processing is exemplified in computer graphics systems and methods. In the computer graphics field, data is processed using a multi-pass pass data-processing pipeline in which each pass performs a sequence of operations on the data. Processed data, for example color or depth values for pixels, may be stored as a texture map for re-use in a conventional multi-pass data-processing system.
In computer graphics, multi-pass methods generally create image data, in a first pass, that are then used as input in a subsequent pass. For example, in the first pass an image may be rendered by a graphics data-processing pipeline and stored in a frame buffer as image data. The image data is then used, for example, as a texture map, in a subsequent pass, to generate new image data, which can then be used in another pass through the pipeline, producing new data in a frame buffer, and so on and so forth. The end result of the multi-pass process is a final image in a frame buffer for optional display to a user. A graphics processor having a graphics pipeline architecture is described further herein for purposes of clarity with respect to a “first pass” through a graphics pipeline to generate initial image data and a “second pass” or subsequent pass through a graphics pipeline to generate display image data. However, it should be understood and appreciated that “passes” involve a sequence of operations which may be done with a multi-processor/multi-engine graphics processor architecture instead of a graphics pipeline processor architecture.
Geometry processors are hardware components configured to accept a specific format for vertex information. More particularly, vertex information may be of a fixed or floating-point data length with sufficient precision for image rendering. However, after vertex information is processed in a graphics processor to provide image data, for example to generate color and depth (z) values for pixels to be rasterized to be scanned out for display or stored in graphics memory, such stored image data is no longer in such a specific format for use by a geometry processor. Additionally, image data to be scanned out for display does not involve the precision associated with geometry processing, and thus data lengths, as associated with data precision, for image data are shorter than vertex information data lengths. Accordingly, as is described further herein, this stored image data is obtained from graphics memory by a host processor and translated into vertex information in such a specified format for re-use by a geometry processor. Such translated image data may be stored in graphics memory for later use by a graphics processor or provided more directly from a host processor to a graphics processor. Though graphics memory is described in terms of local memory or frame buffer memory, it should be appreciated that graphics memory may be shared memory of host memory.
Reflection mapping is an example of a multi-pass process of the prior art. In a first pass through a graphics data-processing pipeline, an image is rendered using a viewpoint located at a position occupied by a reflective object in a scene. The rendering results in an intermediate red-green-blue (RGB) image that is stored in a frame buffer. In the second pass, the RGB image generated in the first pass is used as a reflection map, a particular type of texture map. In the second pass, the scene is rendered, and surface normals (normal vectors) of the reflective object, along with vectors from the viewpoint to each point on the reflective surface, are used to compute texture coordinates to index the reflection map to the surface of the reflective object. Hence, this example includes two passes, a first pass to generate a reflection map by rendering an image from a first vantage point; and a second pass to render the scene to produce a final image, using the reflection map to color (texture) the reflective object.
Shadow mapping is another multi-pass method of the prior art. In shadow mapping, a depth-only image is first rendered from the vantage point of each light source. The resulting image data is then used while rendering an entire scene from the viewpoint of an observer. During the rendering of the scene, the depth-only images are conditionally used to include each of the corresponding light sources when computing a color value, that includes lighting, for each pixel or pixel fragment.
FIG. 1 is a block diagram illustrating a prior art General Computing System generally designated 100 and including a Host Computer 110 coupled through a bus disposed on a motherboard of Host Computer 110, such as External Bus 115, to a Graphics Subsystem 120. Though a direct memory access (DMA) connection between Host Processor 114 and Interface 117 is illustratively shown, Graphics Subsystem 120 may be connected to Host Memory 112 via an input/output (I/O) hub or controller (not shown) as is known. Host Computer 110 is, for example, a personal computer, server, computer game system, or computer-based simulator, including a Host Processor 114. A Host Memory 112 of host computer 110 may be used to store geometric data representative of one, two, three, or higher-dimensional objects. For example, Host Memory 112 may store x, y, z data representing locations of surface points in “object space.” These x, y, z data are often associated with u, v data relating each surface point to a color or texture map. Host memory 112 may store information relating the relative positions of objects and a viewpoint in “world space.” In some instances Host Computer 110 is configured to tessellate the x, y, z, data to generate a vertex-based representation of primitives that represent a surface to be rendered.
Graphics Subsystem 120 receives data from Host Memory 112 through an Interface 117. The bandwidth of Interface 117 is limited by External Bus 115, which is typically a peripheral bus, e.g., accelerated graphics port (AGP) or peripheral component interface (PCI) coupled to Host Memory 112 of Host Computer 110. A Memory Controller 130 manages requests, initiated by hardware components of Graphics Subsystem 120, to read from or write to a Local Memory 135. Communication between Interface 117 and Memory Controller 130 is through an Internal Bus 145. Geometry Processor 140 is designed to operate on the types of data received from Host Computer 110. For example, Memory Controller 130 receives vertex data via Interface 117 and writes this data to Local Memory 135. Subsequently, Memory Controller 130 receives a request from the Geometry Processor 140 to fetch data and transfers data read from Local Memory 135 to Geometry Processor 140. Alternatively, Geometry Processor 140 may receive data directly from Interface 117. In some prior art graphics subsystems (not shown), a DMA processor or command processor receives or reads data from Host Memory 112 or Local Memory 135, and in some prior art graphics subsystems (not shown) Graphics Subsystem 120 is integrated into an I/O hub or I/O controller, where graphics memory is shared with Host Memory 112 though some Local Memory 135 may be provided.
Geometry Processor 140 is configured to transform vertex data from an object-based coordinate representation (object space) to an alternatively based coordinate system such as world space or normalized device coordinates (NDC) space. Geometry Processor 140 also performs “setup” processes in which parameters, such as deltas and slopes, used to rasterize the vertex data are calculated. In some instances Geometry Processor 140 may receive higher-order surface data and tessellate this data to generate the vertex data. Geometry Processor 140 is configured to accept a specific format for vertex information. More particularly, vertex information may be of a fixed or floating-point data length with sufficient precision for image rendering.
The transformed vertex data is passed from Geometry Processor 140 to Rasterizer 150 wherein each planar primitive (e.g., a triangle or a quadrilateral) is rasterized to a list of axis-aligned and distributed grid elements (i.e., discretized) that cover an image to be rendered. The grid elements, conventionally in NDC space, are mapped onto a region of an array of pixels that represent the complete image to be rendered. Each element of the array covered by a grid element is a fragment of the corresponding image and is therefore referred to as fragment data; the fragment data is for one or more pixels or pixel fragments. Each fragment data element output by Rasterizer 150 includes associated data characterizing the surface (e.g. position in NDC, colors, texture coordinates, etc.).
Each fragment data element output by Rasterizer 150 is passed to a Texturer 155 and to a Shader 160 wherein the fragment data is modified. In one approach, modification is accomplished using a lookup table stored in Local Memory 135. The lookup table may include several predetermined texture or shading maps that may be accessed using texture coordinates as indices. An output of Shader 160 is processed using Raster Operation Unit 165, which receives the fragment data from Shader 160 and, if required, reads corresponding pixel data such as color and depth (z) in the current view for additional processing.
After performing the pixel operations involving color and z, Raster Operation Unit 165 writes the modified fragment data into Local Memory 135 through Memory Controller 130. The modified fragment data, written to Local Memory 135, is new or initial pixel data with respect to a first pass through a graphics pipeline. The pixel data is stored subject to modification by one or more subsequent fragment data written to the same pixel (memory) location or delivery to a Display 175 via Scanout 180.
Alternatively, pixel data within Local Memory 135 may be read, through Memory Controller 130, out through Interface 117. Using this approach, data in Local Memory 135 may be transferred back to Host Memory 112 for further manipulation. However, this transfer occurs through External Bus 115 and is therefore slow relative to data transfers within Graphics Subsystem 120. In some instances of the prior art, pixel data generated by Raster Operation Unit 165 may be read from Local Memory 135 back into Raster Operation Unit 165 or Texturer 155. However, in the prior art, data generated in the graphics data-processing pipeline (i.e., Geometry 140, Rasterizer 150, Texturer 155, Shader 165, and Raster Operation Unit 165) as output from Raster Operation Unit 165 was not accessible to Geometry Processor 140 without first being converted into a compatible format by Host Computer 110.
FIG. 2 is a flow chart illustrating a prior art method of image rendering using the General Computing System 100 of FIG. 1. In a Receive Geometry Data Step 210 data is transferred from Host Memory 112 through Interface 117 to either Local Memory 135, under the control of Memory Controller 130, or directly to Geometry Processor 140. This transfer occurs through External Bus 115, which, in comparison to data busses within Graphics Subsystem 120, has lower bandwidth. In a Process Geometric Data Step 220, performed using Geometry Processor 140, surfaces within the transferred data are tessellated, if needed, to generate vertex data and then transformed. After transformation, primitive “setup” for rasterization is performed. In a Rasterize Step 230 performed using Rasterizer 150 fragment data is generated from vertex-based data.
In a Process Fragments Step 240 the fragment data is textured and shaded using Texturer 155 and Shader 160. In an exemplary implementation, per-vertex colors and texture coordinates (among other per-vertex attributes) are bilinearly interpolated per fragment across the primitive to compute color and z (depth) values that are output to Raster Operation Unit 165.
In a Store Pixel Data Step 250 Raster Operation Unit 165 is used to map the fragment produced in the previous step onto a pixel in Local Memory 135, optionally operating on previously-stored data at that pixel location, and, finally, depending on the result of available tests (e.g., depth test, alpha test, stencil test) in the Raster Operation Unit 165, conditionally storing the fragment data into its corresponding pixel location in Local Memory 135. Storage occurs by writing data through Internal Bus 170. The color data generated by Raster Operation Unit 165 is typically limited to match color depth of supported displays. Data from Local Memory 135 are transferred to a display device in a display step 260.
FIG. 3 is a flow chart illustrating an advanced method of image rendering known as reflection mapping. In this method pixel data is first rendered for a first scene and viewpoint. A scene consists of one or more objects. The first image is then used as a texture map for shading one or more objects in a second viewpoint of the scene. The final image shows a reflection of the first scene on the surface of the object in the second viewpoint of the scene. As shown in FIG. 3, steps 210 through 250 are performed in a manner similar to that described in relation to FIG. 2. In store pixel data step 250 the first scene pixel data is stored in a region of Local Memory 135 that can be read by Texturer 155. Instead of immediately being used in Display Step 260, the data stored in Store Pixel Data Step 250 is used in a second pass through the graphics data-processing pipeline. The second pass starts with a receive geometry data step 310 wherein geometry data representing an object in the second viewpoint of the scene is received from host Computer 110. This data is processed using Geometry Processor 140 in Process Geometric Data Step 320 and transformed into second fragment data in a Rasterize step 330.
In a Process Fragments Step 340, the second fragment data is shaded using the first pixel data stored in Store Pixel Data Step 260. This shading results in an image of the first scene on the surface of an object in the second viewpoint of the scene. The shaded second pixel data is stored in a Store Pixel Data Step 350 and optionally displayed in a Display Step 260.
Due to the specified dynamic range and precision of the numerical values (i.e., the formats) associated with data busses or graphics data processing elements within Graphics Subsystems 120, heretofore data had to be converted, if feasible, by Host Processor 114 of Host Computer 110 to facilitate date re-use. For example, color values written into Local Memory 135 are 24-bit RGB fixed integer data strings, making those values incompatible with Geometry Processor 140 using values representing vertices or surfaces, where data lengths are conventionally longer than 24-bits and which may use floating-point values. Heretofore, for Geometry Processor 140 to process data written by Raster Operations Unit 165, Host Processor 114 read and formatted such data to produce data formatted for input to Geometry Processor 140. Notably, data formatted for Geometry Processor 140 could be provided to graphics subsystem 120 via External Bus 115; however, in addition to the above-mentioned drawbacks, this use consumes performance dependant bandwidth between Host Computer 110 and Graphics Subsystem 120. Therefore, it would be desirable and useful to increase flexibility for data re-use by a graphics subsystem. Additionally, it would be desirable and useful to improve system level performance by providing data re-use with less dependence on such performance dependent bandwidth.