The disclosed invention relates generally to computer systems. More particularly, the present invention relates to a dynamic random access memory (DRAM) frame buffer device, and system based on that device, which provides an architecture for performing accelerated two-dimensional and three-dimensional graphics rendering operations.
The Dual Pixel 3DRAM chip and graphics processing system is used to implement high performance, high capacity frame buffers. Certain aspects of the disclosed Dual Pixel 3DRAM chip, and graphics processing system based upon the Dual Pixel 3DRAM chip, are disclosed in U.S. Pat. No. 5,544,306, issued to Deering et al. on Aug. 6, 1996, which patent is incorporated by reference in its entirety into this disclosure as if it were fully set forth herein.
The disclosed invention presents an alternative to the use of external DRAM frame buffers. To meet near term performance objectives, it is tempting to use embedded DRAM for frame buffer memory, because it might be feasible to fit between 4 and 8 Megabits on a die with surface area remaining to implement an interesting amount of logic. However, in the same time frame, graphics-oriented computing products will require between 10 and 80 Megabits of frame buffer memory. Thus, between 2 and 10 embedded DRAM devices would be necessary to implement a frame buffer that would meet the requirements of graphics processing computing systems. While the fill rate for such a frame buffer would be very high, the cost would be prohibitive for a large segment of the computing market.
In processing two-dimensional and three-dimensional graphic images, texture mapping must be accelerated to match fill rate. However the distributed frame buffer described above makes this difficult to do efficiently. The easiest way to distribute texture data would be for each device to have its own copy of everything, however this method is a very inefficient use of embedded DRAM bits. An alternate approach would distribute texture data among the devices comprising a frame buffer such that data is not duplicated. This alternative would use embedded DRAM bits efficiently, but would also requires the routing of massive amounts of texture data between devices.
Using embedded DRAM to implement a texture cache on a single device might be more practical. Texture data would have to be paged in from system memory, which would work more efficiently if the texture data were compressed.
Embedded DRAM could also be used to implement a primitive FIFO between the setup unit and the rasterizer. This FIFO would allow geometry and setup processing to continue while big triangles are being rendered. It could also be used to tolerate the latency of paging and decompressing texture data in from system memory.
Region-based rendering architectures, such as Talisman, PixelFlow, or Oak""s WARP 5, render a small portion of the frame buffer on the rendering controller and then transfer the final color values to external DRAM. The controller then renders the next region, and the one after that, until the entire frame is covered.
All of the bandwidth used for hidden surface removal and anti-aliasing remains entirely on the rendering controller, so fill rate is not limited by external bandwidth. All of the storage used for hidden surface removal and anti-aliasing needs only to be implemented for a small portion of the frame buffer and can be kept on the rendering controller.
The big disadvantage of region-based rendering is that all of the geometry for a frame needs to be sorted into regions and stored somewhere before rendering can begin. This requirement generally places an upper limit on the amount of geometry that can be rendered per frame. This limitation is unacceptable for many applications. Some region-based rendering architectures can still function somewhat correctly when given too much geometry, by writing depth and color values for regions to and from external DRAM. However, this implementation loses all of the benefits of region-based rendering, while retaining all of the disadvantages.
Mechanical CAD and other content creation applications cannot tolerate limits on geometric complexity. Such applications require the ability to smoothly trade off geometric complexity for frame rate. Thus neither embedded DRAM nor region-based rendering approaches provide adequate solutions to meet the performance demands and practical cost constraints of present graphics processing applications.
Another concern with respect to frame buffer design is the performance trade off between single-ported and dual-ported frame buffer memories. Dual-ported frame buffers have a dedicated display port which enables the render port to spend more of its time rendering. Typically, a dual-ported frame buffer comprised of video random access memory (VRAM) chips loses only approximately one to two percent of its fill rate to video transfer operations, because its video buffers are quite large. Frame buffers comprised of FBRAM chips (also referred to as 3D-RAM(trademark) chips) lose approximately five to ten percent of their fill rate to video transfer operations, because their video buffers are smaller.
A single-port memory cannot render when it is reading pixel data for display, unless the port supports high speed, bidirectional signaling. If one compares single- and dual-ported memories where both render port bandwidths are identical, then the dual-ported memory will have both a higher fill rate and a higher cost. If one compares single- and dual-ported memories where the bandwidth of the single port is equal to the sum of the dual-port bandwidths, then the single-ported memory""s fill rate is likely to be higher than the dual-ported memory, because the single-ported memory is more efficient. Thus, to the extent that bandwidth limitations are presently being relaxed due to the emergence high bandwidth input/output (I/O) capacities, a single-ported memory architecture promises more efficient frame buffer performance.
Dual-ported memories allow a smoother flow of pixels to the frame buffer. A single-ported memory is unavailable for rendering on a periodic basis while it reads bursts of display data. The rendering controller requires a larger pixel FIFO to smooth out pixel flow when interfacing with a single-ported memory. In a lower cost system, the renderer may be idle during such display bursts.
A single-ported memory is cheaper due to savings in die area, pins, packaging, testing, and power consumption. A single-ported memory has a significantly lower cost per bit of storage than a dual-ported memory of the same size. If the cost per bit is lower, storing non-displayable data in the frame buffer is easier to justify.
A dual-ported memory has a fixed display bandwidth. If the required display bandwidth is lower, then bandwidth is being wasted. If the required display bandwidth is higher, then the memory is not suited to the display requirements. A single-ported memory has the flexibility to trade off render bandwidth and display bandwidth. In a pinch, a single-ported memory can actually provide much higher display bandwidth.
The dedicated display port of a dual-ported memory is not used during horizontal and vertical blanking intervals, which means the display port is idle approximately twenty percent of the time.
A dual-ported memory dictates a fixed mapping of pixels and blocks to the screen. A single-ported memory can map pixel and blocks to the screen with much greater flexibility.
A dual-ported frame buffer memory only makes sense if the render and display ports are connected to different chips. If both ports are connected to the same chip, then a single-port memory, with equivalent bandwidth, would be more efficient for the reasons stated above.
A single-ported memory enables the building of lower cost systems, because the cost per bit of frame buffer storage is cheaper, and because the rendering and display chips can be merged into a single device.
Thus, a single-ported memory enables one to design lower cost, low-end systems than could be designed with a dual-ported memory. The cost per bit of storage is significantly lower with a single-ported memory which will make the bill of materials significantly lower for high resolution/high pixel depth designs. Due to its greater flexibility, a single-ported memory yields a design which offers a wider range of product capabilities.
The present invention is directed to a single-ported frame buffer access memory (Dual Pixel 3DRAM) chip which provides accelerated rendering of two-dimensional and three-dimensional images in a computer graphics system.
The Dual Pixel 3DRAM chip features a single-ported, high speed memory which is accessed by a rendering controller over a rendering bus. The Dual Pixel 3DRAM chip comprises a DRAM array, an SRAM pixel buffer, at least one pixel arithmetic-logic unit (ALU), and a global bus. The Dual Pixel 3DRAM chip also comprises a number of data buses and data formatters which route and format graphics data as that graphics data is processed, updated, transmitted off of, and stored within the Dual Pixel 3DRAM chip.
In a first aspect of the present invention, the Dual Pixel 3DRAM chip is configurable to process varying pixel sizes and formats, ranging from 8-bit pixels up to 512-bit pixels. The Dual Pixel 3DRAM chip features novel protocol and data packing schemes to implement these capabilities.
In another aspect of the present invention, the Dual Pixel 3DRAM chip supports variable input and output data rates over the rendering bus, which permits both 2-cycle and 3-cycle pixel ALU operations on the chip.
In another aspect of the present invention, the Dual Pixel 3DRAM chip processes two separate pixels or samples per operation simultaneously.
In another aspect of the present invention, the Dual Pixel 3DRAM chip features data compression capabilities which permit higher fill rates and throughput between the chip and the rendering controller.
In another aspect of the present invention, the Dual Pixel 3DRAM chip employs a multi-sampling scheme which employs a novel delta Z algorithm to render antialiased polygons.
In another aspect of the present invention, the Dual Pixel 3DRAM chip employs a novel scheme for retaining DRAM bank and column addresses on-chip to minimize bandwidth requirements over the address and control bus between the rendering controller and the chip.
In another aspect of the present invention, the Dual Pixel 3DRAM chip comprises a relationship between data transfer speed and width of the data buses internal to the chip, such that bandwidth is balanced to optimize the operational efficiency of the chip.
In another aspect of the present invention, the Dual Pixel 3DRAM chip performs multi-precision pixel blend operations such that inputs of any bit width may be blended.
In another aspect of the present invention, the Dual Pixel 3DRAM chip divides address and control information into three separate sets of signals which are simultaneously transmitted to control DRAM bank operations, global bus operations, and pixel AlU operations on the chip.
In another aspect of the present invention, the Dual Pixel 3DRAM chip features a Flash Line operation which writes to multiple buses between DRAM bank column decoders and sense amps resulting in an increase in the clear rate of the frame buffer by a factor of four or more.
In another aspect of the present invention, the Dual Pixel 3DRAM chip features a novel operation, Change Cache Line, which permits simultaneous transfer of data between different levels of cache due to a bi-directional global bus between the DRAM array and the SRAM pixel buffer.
In another aspect of the present invention, the Dual Pixel 3DRAM chip features a Change Page bank operation in which the precharge page and the access page bank operations are combined into a single operation.
In another aspect of the present invention, the Dual Pixel 3DRAM chip features pixel ALU operations in which data or the contents of certain registers is broadcast over certain buses on the chip.
In another aspect of the present invention, the Dual Pixel 3DRAM chip features innovative means for reading pixel data, in either single or dual pixel format, from the SRAM pixel buffer.
The above-described and other features of the present invention, including various novel details of operation, construction, assembly and combination of parts, will now be more particularly described with reference to the accompanying drawings. It shall be understood that the particular embodiments of the invention are disclosed herein by way of illustration only and shall not impose limitations on the invention as claimed. The principles and features of this invention may be employed in numerous and varying embodiments without departing from the scope of the present invention.