The present invention relates generally to systems for computer graphics. More specifically, the present invention includes a method and apparatus for efficiently rasterizing graphics primitives.
Computer systems (and related devices) typically create three-dimensional images using a sequence of stages known as a graphics pipeline. During early pipeline stages, images are modeled using a mosaic-like approach where each image is composed of a collection of individual points, lines and polygons. These points, lines and polygons are known as primitives and a single image may require thousands, or even millions, of primitives. Each primitive is defined in terms of its shape and location as well as other attributes, such as color and texture.
The primitives used in early pipeline stages are transformed, during a rasterization stage, into collections of pixel values. The rasterization stage is often performed by a specialized graphics processor (in low-end systems, rasterization may be performed directly by the host processor) and the resulting pixel values are stored in a device known as a frame buffer. A frame buffer is a memory that includes a series of randomly accessible memory locations. Each memory location in the frame buffer defines a corresponding pixel included in an output device where the image will ultimately be displayed. To define its corresponding pixel, each memory location includes a series of bits. Typically, these bits are divided into separate portions defining red, blue and green intensities. Each memory location may also include depth information to help determine pixel ownership between overlapping primitives.
During the rasterization stage, the graphics processor renders each primitive into the frame buffer. The graphics processor accomplishes this task by determining which frame buffer memory locations are included within the bounds of each primitive. The included memory locations are then initialized to reflect the attributes of the primitive, including color and texture.
The rasterization stage is followed by a display stage where a display controller transforms the pixel values stored in the frame buffer into signals that drive the output device being used. The display controller accomplishes this task by scanning the memory locations included in the frame buffer. The red, blue and green portions of each location are converted into appropriate output signals and sent to the output device.
The throughput of a graphics pipeline is highly dependent on frame buffer performance. This follows because the frame buffer functions as a middleman between the rasterization stage and the display stage. As a result, the frame buffer becomes the focus of repeated memory accesses by both the graphics processor and the display controller. The number of these accesses may be quite large. The frame buffer must be able to sustain a high rate of these accesses if it is to avoid becoming a performance bottleneck.
Frame buffers are typically fabricated using arrays of dynamic random access memory (DRAM) components. Compared to other technologies, such as static random access memories (SRAMs), DRAM components represents a better trade off between performance and cost. At the same time, achieving acceptable frame buffer performance may be far more complicated when DRAM components are used. The complexity involved in DRAM use stems from the addressing scheme used by these components. For this scheme, memory locations are addressed using a combination of a row address and a column address. Row and column addresses are supplied in sequencexe2x80x94row address first, column address second. Depending on the specific type of DRAM components used, this two-step addressing scheme may be too time consuming to sustain the memory access rate required for frame buffer use.
Fortunately, many DRAM components also provide a faster page addressing mode. For this mode, a sequence of column addresses may be supplied to a DRAM component after the row address has been supplied. Accesses within a row require only a single address. The overall effect is that accessing a DRAM component is much faster when a series of accesses is confined to a single row. Accessing a location included in a new row, referred to as a page miss, is much slower.
For this reason, frame buffers are often designed to maximize consecutive accesses within DRAM rows and to minimize page misses. One way in which this is accomplished is to structure the frame buffer so that graphics primitives tend to map to a single DRAM row or a small number of DRAM rows. Memory tiling is an example of this type of frame buffer structuring. In frame buffers that use memory tiling, the memory locations included in a DRAM row map to a rectangular block of pixels. This contrasts with more typical frame buffer construction where DRAM rows map to lines of pixels. Memory tiling takes advantage of the fact that many primitives fit easily into blocks and that few fit easily into lines. In this way, memory tiling reduces page misses by increasing the chances that a given primitive will be included within single DRAM row or a small number of DRAM rows.
Another way to maximize consecutive accesses within DRAM rows and to minimize page misses is to position a cache memory between the graphics processor and the frame buffer. The cache memory collects accesses performed by the graphics processor and forwards them to the cache on a more efficient row-by-row basis.
Memory tiling and cache memories are both effective techniques for improving the performance of DRAM based frame buffers. Unfortunately, the rasterization technique used within most frame buffers does not fully exploit the full potential of memory tiling or cache memories used in combination with memory tiling. This follows because rasterization is typically performed on a line-by-line basis. When used in a tiled frame buffer, line-by-line rasterization effectively ignores the tiled structure of the frame buffer. As a result, a given rasterization may alternately access and re-access a given set of tiles. This results in an increased number of DRAM page misses and decreases the throughput of the frame buffer and graphics pipeline. As a result, there is a need for rasterization methods that more effectively exploit the full potential of memory tiling and cache memories used in combination with memory tiling.
An embodiment of the present invention includes a method and apparatus for efficiently rasterizing graphics primitives. In the following description, an embodiment of the present invention will be described within the context of a representative graphics pipeline. The graphics pipeline is a sequence of components included in a host computer system. This sequence of components ends with a frame buffer followed by a display controller.
The frame buffer is a random access memory device that includes a series of memory locations. The memory locations in the frame buffer correspond to pixels included in an output device, such as a monitor. Each memory location includes a series of bits with the number and distribution of bits being implementation dependent. For the purpose of description, it may be assumed that each memory location includes four eight bit bytes. Three of these bytes define red, blue and green intensities, respectively. The fourth byte, alpha, defines the pixel""s coverage or transparencies.
The memory locations included in the frame buffer are preferably organized using a tiled addressing scheme. For this scheme, the memory locations included in the frame buffer are organized to correspond to rectangular tiles of pixels included in the output device. The number of pixels (and the number of frame buffer memory locations) included in a single tile may vary between different frame buffer implementations. In most cases, the tile size will be a power of two. This provides a convenient scheme where more significant address bits choose a specific tile and less significant address bits choose an offset within the specific tile. In cases where the frame buffer is fabricated using DRAM or DRAM-like memory components it is preferable for each tile to map to some portion of DRAM row. Thus, each DRAM row includes one or more memory tiles.
The display controller scans the memory locations included in the frame buffer. For each location scanned, the display controller converts the red, blue and green intensities into appropriate output signals. The display controller sends these output signals to the output device being used. The display controller continually repeats this scanning process. In this way, the contents of the frame buffer are continuously sent to the output device.
The graphics processor rasterizes graphics primitives into the frame buffer. To accomplish this task, the graphics processor determines which frame buffer memory locations are included within the bounds of each primitive. The included memory locations are then initialized to reflect the attributes of the primitive, including color and texture. During rasterization, the graphics processor uses a hierarchy of memory tiles. Within this hierarchy, smaller tiles are grouped into larger tiles. These larger tiles may be grouped, in turn, into still larger tiles. For a representative embodiment of the present invention, the tile hierarchy includes three levels. The lowest level of the hierarchy is made up of four pixel by four pixel low-level tiles. These four-by-four tiles are grouped into eight-by-eight mid-level tiles and the eight-by-eight tiles are grouped into sixteen-by-sixteen high-level tiles.
The graphics processor begins the process of rasterizing a primitive by selecting one of the primitive""s vertices as a starting vertex. The graphics processor then rasterizes the low-level tile that includes the starting vertex. When rasterization of the first low-level tile is complete, the graphics processor moves left-to-right, top-to-bottom through the remaining low-level tiles that are included in same mid-level tile as the first low-level tile. The graphics processor rasterizes each of these low-level tiles that include pixels within the primitive. When the last of these low-level tiles has been rasterized, the graphics processor has completely rasterized the first mid-level tile.
When rasterization of the first mid-level tile is complete, the graphics processor moves left-to-right, top-to-bottom through the remaining mid-level tiles that are included in same high-level tile as the first mid-level tile. The graphics processor rasterizes each of these mid-level tiles that include pixels within the primitive by repeating the method used to rasterize the first mid-level tile (i.e., by rasterizing their component low-level tiles). When the last of these mid-level tiles has been rasterized, the graphics processor has completely rasterized the first high-level tile.
When rasterization of the first high-level tile is complete, the graphics processor moves left-to-right, top-to-bottom through the remaining high-level tiles that span the primitive. The graphics processor rasterizes each of these high-level tiles by repeating the method used to rasterize the first high-level tile (i.e., by rasterizing their component low-level tiles which are rasterized, in turn, by rasterizing their component low-level tiles). When the last of these high-level tiles has been rasterized, the graphics processor has completely rasterized the primitive.
Effectively, the primitive is rasterized in a bottom-up fashion. The graphics processor rasterizes low-level tiles, mid-level tiles and high-level tiles, completing rasterization at each level before moving up the hierarchy. The use of the tile hierarchy increases the temporal locality of accesses within a given memory tile. Increasing temporal locality reduces between tile access. For frame buffers that support fast tile-based access, this enhances graphics throughput. The increased temporal locality of accesses within a given memory tile may also enhance cache memory performance. This is particularly true in cases where cache memory/frame buffer interaction is performed on a tile-by-tile basis.
Advantages of the invention will be set forth, in part, in the description that follows and, in part, will be understood by those skilled in the art from the description herein. The advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the appended claims and equivalents.