1. Field of the Invention
The present invention relates generally to computer graphics architecture and processing. More particularly, it relates to scan conversion of triangle-based polygon data into pixels.
2. Description of Related Art
Three-dimensional (3-D) computer graphics systems display images, which represent real or imaginary objects in real or imaginary settings, on a two-dimensional (2-D) monitor or other output device. As a result, the user xe2x80x9cbelievesxe2x80x9d that he is seeing 3-D objects in a 3-D world. A typical computer graphics system stores such objects in one of the many existing object file formats, using 3-D coordinates to represent spheres, vectors, curves, polygons, and other simpler component objects, along with their associated object properties, such as color, texture, intensity, transparency and/or reflectivity. Environmental data such as the number, color, location, intensity, and other properties of illumination sources, as well as atmospheric properties, are included to add richness in detail to a scene containing one or more objects.
To render such a scene from a particular viewing angle onto a 2-D screen, the xe2x80x9cfront endxe2x80x9d of a typical computer graphics system transforms the collection of objects in the scene into a set of primitives (typically polygons, such as triangles, that are independent of scale), taking into account any movement of objects over time, as well as the scene""s environmental data and the user""s desired viewing angle. Triangles frequently are used as the xe2x80x9cbuilding blocksxe2x80x9d for 3-D objects with complex curved surfaces, because they are simple primitive objects that effectively can xe2x80x9ccoverxe2x80x9d or represent each surface of virtually any complex object in a tiled manner. Relatively simple images might be represented with a few, relatively large triangles, whereas more complex images might require a greater number of smaller triangles. Regardless of their size, triangles typically are represented as three 3-D (x,y,z) vertices, along with color (RGB) and texture information. Of course, given sufficient memory and computational resources, pixels could be used in lieu of triangles to represent complex images even more precisely.
Front-end processing typically still is handled in software on the host system (e.g., a PC), and does not itself require hardware acceleration for most applications. The host system provides a stream of triangles to the xe2x80x9cback endxe2x80x9d of the computer graphics system. The order in which the host system provides these triangles does not necessarily bear any relationship to the screen location at which such triangles might be visible.
The back end of the system is responsible for xe2x80x9crasterizingxe2x80x9d this set of trianglesxe2x80x94i.e., transforming them into the particular pixels that will be displayed on the screen. It projects these 3-D triangles onto a 2-D screen, removes xe2x80x9chidden surfacesxe2x80x9d to prevent portions of triangles that are obscured by other triangles from being displayed, and generates individual pixels (to be displayed on the screen) that xe2x80x9cfill inxe2x80x9d the visible portions of these triangles with their associated color or texture information. Back-end processing typically is relatively time-intensive, and thus often requires hardware acceleration to maintain sufficient performance.
The performance of 3-D graphics systems typically is measured by the number of triangles per second they can process. A key problem therefore is how to architect the back-end of a computer graphics system to process a stream of 3-D triangles as quickly as possible. Ideally, the back end of a system will rasterize, within the time required for one frame to be displayed on the screen (e.g., {fraction (1/60)} of a second for a monitor with a 60 Hz refresh rate), all of the triangles generated by the system""s front end. This is not, however, always possible.
For example, even a moderately complex screen object, such as a person, may be represented by a sufficiently large number of triangles to cause the back end of a typical computer graphics system to take multiple xe2x80x9cframe timesxe2x80x9d to render that object completely. If the scene is static and the person is standing still, the back end may, for example, require 120 frames or 2 seconds to render that scene. If, however, the scene changes frequently, e.g., if that person moves across the screen, the back end would have to rasterize a greater number of triangles per second, because it would have to render, within those same 2 seconds, multiple variations of the same objectxe2x80x94i.e., the same person in different poses and at different locations on the screen. Alternatively, to render an even more complex static image (e.g., a scene with three people together at one time) within those same few seconds would also require the back end to rasterize a greater number of triangles per second. Thus, by processing a greater number of triangles per second, a system is able to render more complex images and/or update images more frequently to reflect changes over time, even though it may not be able to render every image within a single xe2x80x9cframe time.xe2x80x9d
Many of today""s computer graphics applications handle very complex images and/or images that change very frequently. For example, digital imaging applications often require images of near-photographic quality which are represented by a large number of relatively small triangles. A computer graphics system must process many triangles relatively quickly in order to render such images within a reasonable period of time. Computer animation and virtual reality applications, on the other hand, may not require images of such complexity; but, they may require that frames be updated very frequently to reflect, for example, the many changes in a scene that result from a slight movement of a user""s virtual reality headset. In either case, the system must process a larger number of triangles per second than if the images were less complex or changed less frequently.
To obtain adequate performance and process a sufficient number of triangles per second, most current computer graphics systems employ one of two general types of back-end architecturesxe2x80x94(1) frame buffer architectures, which operate on a frame-by-frame basis, generating and writing into a buffer the pixels of each frame of an image to be displayed on the screen, and scanning out those pixels to the screen; and (2) display list architectures, which operate on a scanline-by-scanline basis, generating in scan order (and possibly writing into a buffer) the pixels of each scanline of an image to be displayed on the screen, and scanning out those pixels to the screen.
Systems based on frame buffer architectures, like all back end systems, receive 3-D triangles from the system""s front end. These systems generate pixels to fill in each triangle (or at least the visible portion of each triangle), and store those pixels in a frame buffer that contains memory locations corresponding to each pixel on the screen. Typically, the order in which these systems generate pixels and store them in the frame buffer corresponds to the order in which triangles are received from the system""s front end, and not necessarily the location of such triangles on the screen.
Typical frame buffer architectures employ a double-buffered approach, particularly for animation, in which two frame buffers are utilized. While the system is scanning out to the screen the pixels from the first frame buffer (containing the current image), it simultaneously is writing into the second frame buffer the pixels generated by rasterizing each triangle (for the next image). Once the system finishes processing the triangles for this second frame buffer (even if such processing requires multiple xe2x80x9cframe timesxe2x80x9d), the system can switch buffers (on the next vertical retrace) and begin scanning out to the screen this next image from the second frame buffer, while generating a subsequent image in the first frame buffer.
If the system""s back end cannot generate and store pixels in a frame buffer quickly enough (i.e., cannot process a sufficient number of triangles per second), then the system scans out the same image to the screen for too many xe2x80x9cframe timesxe2x80x9d before switching buffers and displaying the next image. As a result, images are not updated frequently enough to produce the desired animation effect.
If only a single buffer is used (e.g., for rendering a complex static 3-D object in a CAD program), the system displays the image as it is being generated. In this case, if the back end processes too few triangles per second, then the system will take too long to fully render the complete image.
Although all computer graphics systems can process only a limited number of triangles per second, systems based on frame buffer architectures are further limited by the nature of their design. Because they do not necessarily generate pixels in scan order, they cannot begin scanning out to the screen a complete image until after they generate all of the pixels representing that image and store those pixels in a frame buffer. Their overall performance therefore is limited by the time required to generate every pixel necessary to fill in each triangle (or at least the visible portion of each triangle), and write each of these pixels into the frame buffer or some other temporary memory. Further exacerbating this problem are the additional memory accesses made on a per-pixel basis, e.g., to a xe2x80x9cz bufferxe2x80x9d that stores pixel depth information.
Although a computer graphics system must generate a pixel for each location on the screen, it is not necessarily the case that it must write every such pixel (or even every visible pixel from each triangle) into a frame buffer in order to scan out such pixels to the screen. If, for example, a scene contains a large triangle that covers much of the screen, it is wasteful to take the time to store the same pixel value in many locations of the frame buffer memory, merely because that pixel must be displayed at many pixel locations on the screen (as is illustrated below with respect to the present invention).
Moreover, in a typical scene, many triangles may be partially or completely obscured by other triangles. As a result, the system may perform many redundant computations, as well as redundant writes to the frame buffer or other temporary memory, for pixels that ultimately will not be visible on the screen. Some systems, however, implement xe2x80x9chidden surface removalxe2x80x9d algorithms to avoid writing these hidden pixels into the frame buffer, which may reduce this additional performance penalty to some extent.
For a description of a typical frame buffer architecture, see Kurt Akeley, xe2x80x9cReality Engine Graphics,xe2x80x9d Proceedings of SIGGRAPH ""93 (Anaheim, Calif.; Aug. 1-6, 1993), published in COMPUTER GRAPHICS Proceedings, Annual Conference Series 1993, pp. 109-116. Although the Reality Engine system dedicates parallel hardware units to selected subsets of its frame buffer pixel locations, it still suffers from the above-mentioned disadvantages within each hardware unit.
The architecture of Oak Technology""s 64-bit 3-D xe2x80x9cWARP 5xe2x80x9d graphics accelerator is a slight variation of a traditional frame buffer architecture. The WARP 5 first sorts the triangles into regions of the screen where they might generate visible pixels. Individual triangles can, of course, affect multiple regions. Upon completion of this xe2x80x9cX-Y sortxe2x80x9d of the entire set of triangles, the WARP 5 then rasterizes the triangles on a region-by-region basis, one region at a time, generating pixels for the current region and writing them into an on-chip xe2x80x9cminixe2x80x9d frame buffer corresponding to that region of memory. It then writes the contents of each xe2x80x9cminixe2x80x9d frame buffer into a single external (off-chip) frame buffer.
This process, though performed sequentially for each region, is similar in nature to the process employed by more traditional frame buffer architectures, and thus suffers from many of the same disadvantages. The WARP 5 still does not generate pixels in scan order. Although it implements a xe2x80x9chidden surface removalxe2x80x9d algorithm that reduces the redundant pixel computations and writes for obscured triangles, it still generates and writes to a frame buffer (albeit a smaller, on-chip frame buffer) the many pixels necessary to fill in at least the visible portions of every desired triangle within each region before scanning out to the screen any of these pixels. Moreover, it suffers an additional performance penalty by serially (one region at a time) generating and writing pixels. This disadvantage, however, is a tradeoff for the relatively simple hardware necessary to handle only a single region at a time.
As an alternative to frame buffer architectures, display list architectures attempt to reduce the time required to generate and write every pixel (or at least every visible pixel from each triangle) into a frame buffer. Such architectures typically employ a pipeline of massively parallel processors, in which each processor is associated with an individual pixel or triangle (usually within a single scanline), to generate pixels very quickly, and in scan order. These pipelined processors enable the system to generate multiple scanlines in parallel, and thus to begin generating scanlines of a subsequent image before it has finished generating all of the scanlines of the current image, thereby reducing the average number of xe2x80x9cframe timesxe2x80x9d required to generate a complete image.
Display list systems, although they pipeline the pixel generation process, typically cannot generate pixels sufficiently quickly to enable them to be scanned out to the screen xe2x80x9con the flyxe2x80x9dxe2x80x94i.e., immediately as they are generated. A temporary frame buffer therefore still is necessary to buffer at least some number of generated scanlines before the process of scanning them out to the screen can begin.
For a description of a typical display list architecture, see Michael Deering, Stephanie Winner, Bic Schediwy, Chris Duffy and Neil Hunt, xe2x80x9cThe Triangle Processor and Normal Vector Shader: A VLSI System for High Performance Graphics,xe2x80x9d COMPUTER GRAPHICS, Vol. 22, No. 4, pp. 21-30 (August 1988). This system employs a pipeline of 1024 triangle processors, each associated with a single triangle at any one time, to generate scanlines of pixels in scan order.
The Deering et al. System pre-sorts the triangles into a Y-buffer that associates each scanline with a set of those triangles which intersect that scanline, and thus potentially might include pixels visible on that scanline. Each of these triangles is then assigned to one of the triangle processors in the pipeline, and xe2x80x9cblankxe2x80x9d pixels (representing actual pixel locations, processed in scanline order) are sent through the pipeline. Each triangle processor determines whether the current pixel location it receives is visible within its associated trianglexe2x80x94i.e., whether the pixel location falls within that triangle, and whether the interpolated depth of that triangle for that pixel location is xe2x80x9ccloserxe2x80x9d than that generated by any previous triangle processor in the pipeline. If not, it merely passes that pixel onto the next triangle processor. If it is visible (thus far in the pipeline), it replaces the pixel with one having its interpolated depth. At the end of this pipeline, the xe2x80x9cwinningxe2x80x9d pixel is sent through a smaller pipeline to generate RGB pixels that are stored in a temporary RGB frame buffer before being scanned out to the screen.
By employing a pipeline of massively parallel processors to generate pixels quickly and in scan order, display list systems are able to reduce the average number of xe2x80x9cframe timesxe2x80x9d required to generate a complete image. Yet, such systems typically are xe2x80x9cunboundedxe2x80x9d in that they cannot guarantee that every scanline will be generated within a predefined period of time, i.e., because the performance of their pixel-generation process is dependent upon the concentration of triangles within particular regions of the screen.
For example, although the system described above has a fixed number of triangle processors, the number of triangles per scanline (in the image to be rendered) is not fixed. Even though a triangle processor can be associated with a new triangle once it has finished processing the last pixel location within its current triangle, there is no guarantee that a triangle processor will be available when a new triangle is ready to be loaded. If this xe2x80x9coverflowxe2x80x9d condition is detected, one or more additional xe2x80x9cpassesxe2x80x9d through the triangle processor pipeline will be necessary to handle the xe2x80x9coverflowed trianglesxe2x80x9d for a particular scanline. Only when the system completes these additional passes can it generate the correct scanline. Thus, congestion of triangles within a particular region of the screen may impact the overall performance of the system, and effectively increase the average number of xe2x80x9cframe timesxe2x80x9d required to generate a complete image.
Moreover, these pipelined triangle processors cannot necessarily generate pixels sufficiently quickly to enable them to be scanned out to the screen xe2x80x9con the flyxe2x80x9dxe2x80x94i.e., immediately as they are generated. In addition, the system""s circuitry is made more complex by the fact that the pipeline of triangle processors may be processing pixel locations on multiple scanlines at any given point in time, not to mention the complexity and associated performance penalty of having to detect and handle xe2x80x9coverflowxe2x80x9d conditions when triangles are congested within a region of the screen.
Display list architectures also have a number of other disadvantages, such as the higher cost and greater complexity of massively parallel hardware. It generally is not feasible, for example, to include a single processor for every pixel on the screen. Moreover, even if the number of processors is limited, for example, to one per pixel on a single scanline, this may result in little overall performance benefit, due to the large number of triangles that have to be processed by each pixel processor, as well as any pre-sorting of triangles by the system.
To approach the ideal of rasterizing all of the triangles generated by the front end of a computer graphics system within a single xe2x80x9cframe time,xe2x80x9d the system""s back end architecture must be optimized to avoid the bottlenecks while leveraging the benefits resulting from current trends in the semiconductor industry. For example, both logic and memory are increasing in density and decreasing in cost at an exponential rate. Based upon current predictions, by the year 2000, a single ASIC logic chip will contain over 100 million transistors, and mass production of 1 Gbit DRAMs will have begun, with each 1 Gbit DRAM (128 Mbytes) chip being capable of storing a 2 Mpixel image having 64 bytes of storage per pixel. Yet, ASIC pin counts are not increasing, instead remaining relatively constant at about 200-500 pins per ASIC. It is thus apparent that inter-chip bandwidth is likely to remain a significant bottleneck.
This bottleneck underscores the disadvantages noted above, particularly with respect to frame buffer architectures, which suffer performance penalties due in part to the many off-chip memory accesses that result from generating and writing many pixels to a frame buffer, and frequently accessing a xe2x80x9cz bufferxe2x80x9d and other temporary memory. Display list architectures also suffer from similar disadvantages, though they increase overall performance somewhat by pipelining the pixel-generation process. Yet, neither frame buffer nor display list systems can generate pixels sufficiently quickly to enable them to be scanned out to the screen xe2x80x9con the flyxe2x80x9d as they are generated, which would eliminate the need for a frame buffer entirely.
The present invention provides a solution to the above-described problems by employing an architecture attuned to the current trends in the semiconductor industry. Various embodiments of this architecture are optimized to utilize one or a small number of ASICs, each containing a large number of transistors with relatively few interconnects. One embodiment of the present invention can be implemented in a single-chip ASIC which includes all the functionality necessary to perform the triangle buffer writing and rasterization/scan-out duties. Other embodiments may provide for two chips. One chip performs triangle buffering, while the other chip performs rasterization/scan-out functions.
One embodiment of this architecture is a real-time system that implements a two-step process. The first step in this process identifies which triangles are in competition to be rendered at a given pixel location, and stores them in a triangle buffer. The number of competing triangles is bounded in this first step to the xe2x80x9cclosestxe2x80x9d N triangles associated with each pixel location to simplify the pipelined pixel generation implementation in the second step. The second step generates pixels based on the contents of the triangle buffer by resolving the competition, and renders each pixel (e.g., scans it out to the screen) xe2x80x9con the flyxe2x80x9d as it is generated. Specifically, for each pixel location, this second step selects the relevant competing triangles, determines whether that pixel location is inside or outside these competing triangles, determines z depth values for each triangle, resolves the competition to identify the winning triangle, and generates the pixel color/texture associated with that winning triangle.
By first storing triangle information for each triangle into a relatively few key locations in the triangle buffer, the system generally performs far fewer writes per triangle than there are potentially visible pixels within that triangle. It also defers scan conversion until after all triangles have been considered, at which point the system has sufficient information in the triangle buffer to generate each pixel in scan order, and scan that pixel out to the screen xe2x80x9con the flyxe2x80x9d immediately as it is generated.
Writing the triangle information into a particular location of the triangle buffer guarantees xe2x80x9ccoverage competitionxe2x80x9d within a fixed-size region of the screen proximate to that locationxe2x80x94i.e., it guarantees that the triangle will compete to be scan-converted at each of the pixel locations within that region. Triangle information may of course be written into multiple locations of the triangle buffer (each associated with a fixed-size region proximate to that location) to ensure sufficient xe2x80x9ccoverage competitionxe2x80x9d for at least all pixel locations at which that triangle may be visible. Thus, larger triangles may necessitate more writes to the triangle buffer than will smaller triangles.
In one embodiment, the triangle information includes 3-D coordinates and RGB color or texture information for each of three triangle vertices, as well as certain coefficients of xe2x80x9cz-planexe2x80x9d and xe2x80x9cslopexe2x80x9d equations. This information can be used to determine, for any given pixel location on the screen, whether the triangle is xe2x80x9cvisiblexe2x80x9d at that location and, if so, at what depth in the scene.
Prior to writing this triangle information into a selected location of the triangle buffer, the system calculates a xe2x80x9cz depthxe2x80x9d value for the triangle at that location, using an artificial xe2x80x9cmaximumxe2x80x9d value if the triangle is not visible at that location. The system compares the triangle""s calculated z depth value to the z depth value stored at the corresponding location in a separate z buffer (e.g., to determine which of two triangles is xe2x80x9ccloserxe2x80x9d at that pixel location). Initially, all locations in the z buffer are set to the artificial maximum value. Assuming, in one embodiment, that no objects are transparent and no anti-aliasing techniques are employed, then there will exist only one visible surface, and thus only one xe2x80x9cwinningxe2x80x9d triangle, at any given pixel location on the screen. Whenever the system writes triangle information into a selected location of the triangle buffer, it also writes this z depth value into the corresponding location of the z buffer.
For each triangle being processed, the system determines how many fixed-size xe2x80x9ccoverage masksxe2x80x9d are needed to sufficiently cover the triangle""s bounding box. The system first attempts to store the triangle information for a triangle in the triangle buffer memory locations corresponding to the top left corner of each coverage mask. For each coverage mask, if the triangle information for an existing (previously processed) triangle already has been stored at that selected location in the triangle buffer, and is xe2x80x9ccloserxe2x80x9d than (or at the same depth as) the current triangle, then the system attempts to store the triangle information for the current triangle at the next location within that coverage mask. Alternatively, if the current triangle wins, then its triangle information displaces the triangle information for the existing triangle, and the system attempts to relocate the triangle information for the displaced triangle to the next location within the particular original coverage mask associated with that displaced triangle.
In either case, the same process of comparing z depth values continues at each of these next selected locations until the triangle information for each xe2x80x9closingxe2x80x9d triangle has been stored at a selected location within that triangle""s particular associated coverage mask in the triangle buffer, or until such triangle xe2x80x9closesxe2x80x9d at all such locations. In this latter case, its triangle information need not be stored anywhere within that coverage mask area of the triangle buffer because the triangle is not visible (based upon the prior z depth comparisons) at any pixel location on the screen corresponding to any of the fixed-size xe2x80x9ccoverage competitionxe2x80x9d regions associated with each location within that coverage mask area of the triangle bufferxe2x80x94i.e., because the triangle at each such pixel location either is outside the user""s viewing angle or is obscured by a xe2x80x9ccloserxe2x80x9d triangle.
This process of writing triangle information into selected locations of a triangle buffer requires far fewer writes, and far less time, than a frame buffer or display list system would require to generate pixels and store them in a frame buffer. This is due in part to the fact that this process is performed on a per-triangle, not a per-pixel, basis. By employing fixed-size xe2x80x9ccoverage competitionxe2x80x9d areas, the triangle information for each triangle need only be stored at one or a few selected locations in a triangle buffer, as opposed to the far greater number of frame buffer locations corresponding to the number of pixels necessary to fill in the visible portion of each triangle. Moreover, a great deal of time has been saved by deferring the process of scan-converting triangles into pixels.
Once the system has considered all triangles, and stored all relevant triangle information in the triangle buffer, it then generates a pixel for each pixel location on the screen, one at a time in scan order, and immediately scans each pixel out to the screen xe2x80x9con the flyxe2x80x9d as it is generated. This is possible not only because the system""s pixel generation process is heavily pipelined, but also because it is xe2x80x9cbounded,xe2x80x9d in that a fixed maximum number of triangles will compete to be visible at each pixel location on the screen. This maximum number of triangles corresponds to the number of memory locations within the fixed-size xe2x80x9ccoverage competitionxe2x80x9d region associated with each pixel location on the screen. Those triangles whose triangle information was stored within any such region in the triangle buffer are guaranteed to be the xe2x80x9cclosestxe2x80x9d triangles at the pixel location on the screen associated with that region. The prior z depth comparisons effectively discarded other xe2x80x9closingxe2x80x9d triangles having greater depths at that pixel location.
The processes of generating pixels and scanning them out to the screen are performed in parallel via a pipeline that processes the contents of the triangle buffer and generates pixels in scan order. Because this process is xe2x80x9cbounded,xe2x80x9d the system can guarantee that each pixel will be generated in the fixed period of time required to scan that pixel out to the screenxe2x80x94e.g., {fraction (1/60)} of a second, divided by the number of pixels on the screen. Thus, the system incurs no additional overhead to scan-convert triangles into pixels. Its performance (triangles per second) is limited only by the time required to process each triangle and write triangle information into the triangle buffer.
Moreover, by xe2x80x9cboundingxe2x80x9d this process, the hardware required to implement this pipeline is greatly simplified. Compared with massively parallel display list architectures, for example, this pipeline uses far fewer and far simpler processors. Yet, it generates pixels faster and at regular intervals, enabling each pixel to be scanned out to the screen xe2x80x9con the flyxe2x80x9d as it is generated. This system also can operate in a xe2x80x9cdouble-bufferedxe2x80x9d manner. In that case, it utilizes the contents of a first triangle buffer and z buffer to generate pixels and scan them out to the screen for the current frame, while simultaneously storing triangle information for the next frame into a second triangle buffer and z buffer.
In either case, the system transfers the contents of the triangle buffer in scan order into a multi-stage pipeline that includes a xe2x80x9ctriangle cache,xe2x80x9d a column of xe2x80x9ccoefficient evaluators,xe2x80x9d an array of xe2x80x9cz interpolationxe2x80x9d processors, an xe2x80x9cimage composition network,xe2x80x9d and a xe2x80x9cshading unit.xe2x80x9d In effect, this pipeline implements a xe2x80x9csliding coverage competition windowxe2x80x9d, which slides across the triangle buffer determining the xe2x80x9cwinningxe2x80x9d triangle for each pixel location on the screen, in scan order. At each moment in time, the z interpolation processors are calculating z depth values for all competing triangles within that xe2x80x9csliding coverage competition window,xe2x80x9d and then providing them in parallel to the image composition network, which determines the xe2x80x9cwinningxe2x80x9d triangle.
At the beginning of the pipeline, the triangle cache receives and caches the most recent xe2x80x9cNxe2x80x9d rows from the triangle buffer, where N is equal, in one embodiment, to the number of rows in a fixed-size xe2x80x9ccoverage competitionxe2x80x9d region (e.g., 16 rows). The triangle cache wraps around to overwrite the first row after the last row of the cache is filled.
At the next stage of the pipeline, the triangle cache provides a column of triangle information in parallel to the coefficient evaluators, each of which determines certain depth-related components for each triangle stored in that column. After providing the coefficient evaluators the rightmost column of triangle information in the cache, the triangle cache wraps around to provide the leftmost column for the next N rows from the triangle buffer. Because the pipeline generates pixels in scan order, these depth-related components are limited to the row/scanline of the triangle buffer in which the triangle information for each triangle is stored. They enable the next stage of the pipeline to calculate, for any pixel location within that row/scanline, whether the triangle encompasses that pixel location and, if so, the triangle""s interpolated z depth at that pixel location.
These depth-related components include xe2x80x9c2-D spanxe2x80x9d information, which identifies the left and right edges of the triangle intersected by that row/scanline, z depth information for the current pixel being processed on that row/scanline (or for the left edge of the triangle intersected by that row/scanline if the current pixel is not within the triangle), and xe2x80x9cdz slopexe2x80x9d information which indicates the slope, or change in z depth, of the triangle from left to right.
At the next stage of the pipeline, the coefficient evaluators provide a column of triangle information in parallel to a xe2x80x9csliding windowxe2x80x9d or array of z interpolation processors (e.g., M processors, where M is equal to the number of columns in each fixed-size xe2x80x9ccoverage competitionxe2x80x9d region, e.g., 32). Each of these z interpolation processors calculates a z depth value, at the current pixel location being processed, for one of the triangles stored within this (e.g., 32xc3x9716) sliding window of locations in the triangle buffer. As each new column of triangle information is received from the coefficient evaluators, the sliding window of z interpolation processors calculate z depth values for the next pixel location, using a set of competing triangles within the xe2x80x9ccoverage competitionxe2x80x9d region one column to the right of the previous region.
In other words, the coefficient evaluators and z interpolation processors together enable the system to calculate, for the current pixel location being processed, z depth values for all competing triangles within a xe2x80x9ccoverage competitionxe2x80x9d regionxe2x80x94e.g., the 32xc3x9716=512 triangles stored at the locations in the triangle buffer within this region. These z depth values are calculated simultaneously by the array of z interpolation processors for the current pixel location, and provided to an xe2x80x9cimage composition networkxe2x80x9d to determine the xe2x80x9cwinningxe2x80x9d triangle.
At the next stage of the pipeline, the array of z interpolation processors provides all of the z depth values in parallel to the image composition network, which includes a xe2x80x9ctreexe2x80x9d of comparators to compare the z depth values within the current xe2x80x9ccoverage competitionxe2x80x9d region, and determine the xe2x80x9cwinningxe2x80x9d triangle that is visible at the current pixel location being processed. A xe2x80x9cshading unitxe2x80x9d then determines the RGB color or texture for that pixel from the triangle information stored in the triangle buffer for that trianglexe2x80x94e.g., by interpolating from RGB information for each vertex of the triangle.
The calculations at each stage of this pixel generation pipeline are synchronized such that the final pixel data for each pixel is provided by the last stage of the pipeline when the xe2x80x9cvideo clockxe2x80x9d actually scans that pixel out to the screen. As noted above, this is possible because these calculations are xe2x80x9cboundedxe2x80x9d to a relatively small fixed number of triangles. This pipeline also benefits by making efficient use of very wide on-chip xe2x80x9cembedded DRAMxe2x80x9d busses for parallel data transfers between stages of the pipeline, which improves performance significantly and avoids time-consuming off-chip memory accesses.
Another embodiment of the present invention uses micro-polygons instead of polygons (i.e., triangles). Of course, the front end graphics system delivers micro-polygons, which can be conceptually viewed as polygons of higher resolution. The vertices of the micro-polygons are associated with samples or sub-pixels and the micro-polygon is any grouping of a plurality of samples or sub-pixels. A buffer at the output image composition network sums the sub-pixel values per pixel, calculates an average of the sub-pixels per pixel, and associates the average to that pixel. This feature results in smoother edges and improved anti-aliasing effects. One embodiment of the present invention uses micro-polygons in a real-time graphics system.