A modern high-performance graphics workstation suitable for solid modelling must incorporate a number of features to provide high speed rendering of objects while at the same time remaining affordable. Experience shows that the tasks to be accomplished are so numerous and often so complicated that special purpose dedicated hardware is a necessity if useful images are to be rendered and manipulated with adequate speed. Furthermore, it would be desirable if the feature set of the dedicated hardware were flexible and reconfigurable according to the firmware and software subtasks arising from the user's high-level activities. The techniques disclosed herein reduce costs, increase performance and add flexibility.
One major aspect of the invention involves the concept of cache memory. This is a technique often used in high-performance computer systems to increase the speed with which the CPU can access data stored in a main memory. The idea is to use a small high speed memory to replicate the contents of a selected region of the main memory. The CPU does its memory accesses to the cache, which either does or does not contain data representing the desired location in main memory. If the cache does contain the data for the desired location the fast access to the cache acts in the place of a slower access to the main memory. This is called a "hit." If, on the other hand, the cache does not contain data for the address to be accessed, then the contents of the cache must be changed to reflect that part of main memory that does contain the desired data. This is called a "miss," and involves writing the current contents of the cache back into main memory (unless the current content of the cache was never modified) and then loading the cache with data in the main memory taken from the vicinity of the new address of interest. To facilitate this architecture there is usually a wide data path between the cache and the main memory.
A hardware pixel processor in a graphics system is essentially a CPU that needs to write data into a memory. In this case the memory is called a frame buffer, and it has an address for each pixel component of the display. The frame buffer is also accessed by another mechanism that reads the contents of the frame buffer to create the corresponding pixel by pixel image upon a monitor. Typically, the monitor will be a color CRT with red, green and blue (RGB) electron guns whose intensities are varied by discrete steps to produce a wide range of colors. Accordingly, the frame buffer is divided into portions containing multi-bit values for each color of every pixel. The preferred way to do this is to organize the frame buffer into "planes" which each receive the same address. Each plane holds one bit at each address. Planes are grouped together to form multi-bit values for the attributes of the pixels they represent. Attributes include the RGB intensities, and in many systems ON and OFF for pixels in an "overlay" plane that is merged with data in other planes. For instance, an overlay plane might contain a cursor, and the presence of a bit in the overlay plane might force saturation intensity for all three electron guns, regardless of the actual RGB values for that pixel. In graphics systems with two-dimensional displays that are intended for use with solid modelling of three-dimensional objects, there is frequently another attribute that is stored for each pixel: its depth. Hardware storage of depth values greatly facilitates hidden surface removal, as it allows the hardware to automatically suppress pixels that are not upon the outer surface facing the viewer.
In accordance with what has been described above, it is not unusual to find graphics systems with between twenty-four and forty planes of frame buffer memory: perhaps three sets of eight for RGB values and sixteen or more for Z, or depth, values. Considering that the monitor could easily be 1280 pixels wide and 1024 pixels high, and that refreshing the display at a power line frequency of sixty Hertz is a requirement, it can be concluded that a new pixel of twenty-four or more bits (and possibly qualified for depth) must be obtained for the monitor from the frame buffer at a rate of approximately one pixel every nine nanoseconds. To some extent the advent of so called "video display RAM's" has made this easier to do. They have special high speed ports that read blocks of data at high speed for use by a shifter that, when grouped with the shifters of other planes for the same color, produce the multi-bit values for color intensity. These multi-bit values are applied to digital-to-analog converters (DAC's) that in turn generate the signals that actually drive the electron guns.
Despite the video RAM's, formidable problems remain concerning the task of getting the data into the frame buffer in the first place. In the long run, the graphics system will not be able to manipulate an image (draw it, rotate it, cut a hole in it, etc.) any faster than the image can be put into the frame buffer. The speed with which this can be done is one important aspect of "high performance" in a graphics system. Recalling the purpose for caching in a conventional computer system, it will be noted that there is a certain similarity. It would be desirable if a way could be found to cache pixels into a high speed memory and reduce the number of write operations made into the frame buffer. If this could be done without sacrificing other desirable features it would significantly increase the rate with which data could be put into the frame buffer. This is indeed desirable, since much work has been done to develop and perfect dedicated hardware to generate at high speed pixel values from a more abstract description of the image to be rendered.
In the invention to be described each plane of frame buffer memory is equipped with a corresponding plane of a pixel cache. The pixel rendering hardware stores computed pixel values into the frame buffer by way of the cache. Those familiar with pixel rendering mechanisms will appreciate that the order in which pixels are calculated is not necessarily related to the order they are accessed for use in driving the monitor, which is typically vertically by horizontal rows for a raster scanned CRT. Instead, pixels are apt to be generated in an order that makes sense in light of the techniques being used to represent the object. A wire frame model would rely heavily on the drawing of arbitrarily oriented vectors, while shaded polygons would rely heavily upon an area fill based on successive horizontal lines of pixels. For a curved surface the successive horizontal lines are apt to be fairly short, may be of varying lengths, and might not line up exactly above or beneath each other. Clearly, the preferred pixel rendering techniques are no respecters of sequentially addressed memory spaces.thrfore. Yet the sequence of generated pixels are still strongly related by just more than being consecutive members in some order of pixel generation; their locations in the final image are physically "close" to each other. That is, sequentially generated pixels are apt to posses a shared "locality." That this is so has been noticed by others, and has been termed the "principle of locality." It seems clear that to maximize the number of hits, a cache for a frame buffer ought to operate in view of the principle of locality. But it is also clear that a different type of locality obtains for area fill operations than does for arbitrary vectors.
A "tile" is a rectangular collection of pixels. Various schemes for manipulating pixels in groups as tiles have been proposed. It would seem that what a pixel cache for a frame buffer ought to do, at least in part, is cache a tile. But again, the tile shape best suited for area fill operations would be one that is one pixel high by some suitably long number of pixels. The optimum tile shape for the drawing of arbitrary vectors can be shown to be a square. So what is needed then, is a pixel cache whose "shape" is adjustable according to the type of tile best suited for use with the type of pixel rendering to be undertaken.
That object can be achieved by a pixel cache, frame buffer controller and frame buffer memory organization that cooperate to implement a cache corresponding to a tile of adjustable rectangular dimensions. The frame buffer memory organization involves dividing the frame buffer into a number of separately addressable groups. Each group is composed of one or more bits. Along the scan lines of the raster groups repeat in a regular order. Successive scan lines have different starting groups in the pattern of repetition. Thus, whether a tile proceeds horizontally along a scan line, or vertically across successive scan lines, different groups are accessed for the pixels in that tile. This allows the entire tile to be fetched with one memory cycle. In such a scheme adjacent pixel addresses do not necessarily map into adjacent frame buffer addresses, as in conventional bit-mapped displays. Instead, an address manipulator within the frame buffer controller converts a pixel address (screen location) into a collection of addresses (one for each group) according to rules determined by the shape of the tile to be accessed.
Each plane of the frame buffer memory includes a sixteen-bit plane of an RGB pixel cache and a sixteen-bit plane of a Z value cache. (It will be understood, of course, that the number sixteen is merely exemplary, and is not the only practical size of pixel cache.) For each bit in a pixel's RGB values, the pixel's (X, Y) location on the monitor is mapped into the proper location of the plane of the RGB cache associated with that bit. If there is a hit, then the pixel is written to the cache. If there is a miss, then the cache is written out to the frame buffer in accordance with a replacement rule similar to those used with so-called "line movers" or "bitblts." The replacement rule uses sixteen-bit registers named SOURCE, DESTINATION and PATTERN. There is one of these registers for each plane of frame buffer memory. At the time of the preceding miss, each DESTINATION, and not the cache, was loaded with a copy of that region (tile) of the frame buffer that the cache was then to represent. Data was then written to the cache until there was a miss. Then the frame buffer controller simultaneously copied all of the bits of each plane in the cache into each SOURCE; this frees the cache for immediate use in storing new pixel values. The frame buffer controller proceeded to combine each SOURCE with its associated DESTINATION according to the desired rule (OR, AND, XOR, etc.). The result was further modified by the associated PATTERN, which can be used to impose special deviations upon the pixel data. For example, PATTERN might suppress a regular succession of pixels to create "holes" into which might later be placed pixels of another object, thus creating the illusion of transparency. However achieved, the result is written, all sixteen bits in parallel, for each plane, to the frame buffer. The mapping of pixel addresses into the cache and the parallel write into the frame buffer (i.e., the mapping of the cache contents back into frame buffer addresses) are automatically adjusted according to the size and shape of the tile being handled. Thus, one aspect of the invention to be disclosed is a pixel cache memory that accepts programmatically variable tile sizes. It will be further understood as the description proceeds that the tiles may be aligned on selected pixel boundaries, and that those boundaries need not be permanently fixed in advance.
A second major aspect of the invention concerns what is commonly referred to as the Z buffer. In a conventional graphics system the Z buffer is a memory, separate from the frame buffer, holding the Z (depth) value of each pixel. In a high-performance graphics system the Z values are typically sixteen-bit integers. Thus the conventional Z buffer would, like the frame buffer, have an address for each pixel. The second major aspect of the invention allows a more efficient use of memory by making each plane of the frame buffer larger than is necessary merely to hold the RGB values for pixels. Each plane of frame buffer memory contributes memory that can be associated with other such contributions to form all or a portion of a Z buffer. Furthermore, entire planes of what might otherwise be frame buffer memory can be allocated to the Z buffer. At root, what is taught is a very flexible division of available frame buffer memory into an RGB buffer portion and a Z buffer portion. Said another way, the Z buffer can be made any size and located anywhere in the frame buffer memory through the use of a Z buffer mapping.
If it should be the case that the amount of available memory for the Z buffer is less than enough to hold a sixteen-bit integer for each pixel (and in a preferred embodiment this is frequently the case), then hidden surface removal is performed in sections. For example, if there has been only enough memory allocated to the Z buffer to correspond to one fourth of the frame buffer, then the rendering of an image is divided into four similar activities. First, an initial fourth of the display is created. This might be a top-most horizontal strip, or a left-most vertical strip, or any suitable fourth of the display. Pixels that are to reside in the selected fourth of the display are rendered. As the RGB values for those pixels are calculated, so are their Z values.
The existence of a hidden surface implies that there are some addresses in the frame buffer to which more than one RGB value corresponds; each pixel is associated with a different surface (or at least a different portion of the same surface). Absent any special control to the contrary, the various pixels will be calculated in some order related to the way the object has been described to the graphics system's software and the rendering algorithms in use. As each of the multiple pixels corresponding to an address is rendered its RGB and Z values would overwrite the previous values. Hidden surface removal at the hardware level with a Z buffer compares the Z values of the conflicting pixels and allows the one with the least depth to prevail. That is, the Z value of a new pixel in hand for a certain address is compared with the Z buffer value for the pixel already in that address. An old pixel's values are overwritten by the new values if the old pixel is on a hidden surface to be removed, as indicated by the comparison of the Z values. An additional feature of the invention in this connection is the ability to programmatically decide what to do in the event the new and previous Z values are equal.
To continue with the example, the above process is carried out for all pixels in the fourth of the display being generated. Following that, the Z buffer is allocated to represent the next fourth of the display, and the process is repeated until the entire display has been created afresh. This process might take several seconds if the image is extremely complicated and there is but a very small Z buffer. On the other hand, it only has to be done once for each presentation of a new image to the frame buffer, and not once for each refresh of the image from the frame buffer to the monitor.
The above described technique for hidden surface removal with a Z buffer that corresponds to less than the full frame buffer is termed "strip Z buffering". Strip Z buffering requires some cooperation from the software that tells the graphics hardware what to draw. It will be appreciated that the image to be rendered is described in a data structure called a display list and resembling a data base. A simplified description of the graphics system software is that it interacts with the user to get into the form of a display list an object he desires to display and then manipulate. The display list describes the object in the abstract. Any particular view of the object must be derived from that abstract description through specifying from were to view it, where the clip limits are, where the light sources are, etc. This information is used to decide what pixels are needed to form the image on the monitor. If strip Z buffering is in use, then the software that makes that decision (the derivation mentioned above) must also know what region of the screen corresponds to the location of the Z buffer (i.e., where the " strip" is). During the traversal of the display list it must decline to generate pixels for regions not in the strip. Then it must traverse the list again with the new strip, and so on until the entire object has been rendered. In a preferred embodiment the software already knows where the Z buffer is because it controls that, too; the Z buffer may be programmatically located anywhere in the frame buffer.
When hidden surface removal is in effect the pixel processing mechanism that creates individual RGB values for pixels also simultaneously creates the Z value. The Z values need to be stored into the frame buffer at the same overall rate as the RGB values. The Z values are stored via a sixteen-bit cache memory (with one plane per plane of frame buffer memory) that are very similar to the one that caches the RGB values. Recalling that the Z values are themselves sixteen-bit values, one might be tempted to conclude that all sixteen bits of a Z value are stored in the same plane of (excess) frame buffer memory, and that when that is full then the next Z value goes into the excess portion of the next plane. That is not done since it would require the addresses of the Z values to map into various planes of the Z cache, which is a major architectural feature not having a counterpart in the RGB cache. Since the cache mechanism is part of a VLSI chip, two instances of the same architecture is far more desirable than two separate ones. Another important consideration involves the number of planes of frame buffer memory fabricated in a frame buffer memory assembly. The available number of planes of frame buffer memory will be a multiple of that number, which in the preferred embodiment is eight. The preferred embodiment to be described adopts a Z buffer mapping into the excess portions of the frame buffer that spreads the sixteen bits of each Z value out across eight planes of frame buffer memory. This mapping must be flexible and programmatically determinable, since the total number of planes of frame buffer memory can vary (in increments of eight) according to the way the user configures the graphics system (recall the discussion of strip Z buffering).
To cache Z values, the one or more groups of eight planes of frame buffer memory are allocated to the Z buffer. Entire planes can be allocated, or just the "excess" not used as RGB buffer. A minimum of one group of eight must be allocated, implying that a minimum system configuration must include eight planes of frame buffer memory. (This is certainly no impractical requirement for a color system; a typical system would have twenty-four planes of frame buffer memory, although less is possible.) A group of eight planes of frame buffer memory used for the Z buffer can be either eight "excess" portions not used as RGB buffer, or eight full planes used solely for the Z buffer.
Each sixteen-bit plane in a Z buffer cache in a group of eight receives two bits at a time from a Z value. In this way the entire sixteen-bit Z value is cached in eight two-bit portions of Z buffer cache memory. When there is a miss each plane of the Z cache is written out to an associated Z WRITE register, from whence it is written to the Z buffer. The Z WRITE registers may have to contend with the SOURCE registers for access to the frame buffer (the Z cache fills at a rate twice that of the RGB cache, so sometimes there will be contention, other times not). Thus, the transfer from the Z cache to the Z WRITE registers frees the Z cache to begin accepting new Z values immediately.
The display list will previously have been divided into portions that correspond to the one or more groups of eight planes of frame buffer memory. The display list portions are required to remain within the boundaries of their associated strips. As the portions of the divided (and perhaps even regrouped) display list are traversed, Z values are written into the Z buffer. The order of these write operations is display list dependent; some Z buffer locations may never be written to, while others may be written into more than once as hidden surface removal proceeds. Eventually, the traversal of the display list portion is complete. If there is another strip to construct, the mapping of the one or more groups of eight is changed to reflect the next strip and the next portion of the display list is traversed; otherwise the entire display has been constructed and the strip Z buffering process has been concluded.
Another aspect of the invention concerns a programmatically variable mapping between the pixel interpolator and the planes of frame buffer memory. It is desirable in a system with three color interpolators but with only a minimum number of planes in the frame buffer to be able to select between one interpolator computing shades of gray and three interpolators independently computing red, green and blue values. In addition, to facilitate double buffering it is desirable to control the mapping of pixel data bits into the frame buffer. Such a controlled mapping can be obtained through the use of a pipelined data shifter controlled by a register partitioned into values encoding the number of shifts to be performed at each level of the pipeline. In a related aspect of the invention the pipelined shifter allows the programmatic selection of the number color intensity bits for each color that will be stored in the frame buffer.
Still another aspect of the invention concerns the way the color map is updated. A color map is used to create an arbitrary (or nearly so) correspondence between the R, G and B values stored in the frame buffer and the digital values actually sent to the R, G and B DAC's. This allows, for example, a four-plane R value to be mapped into any sixteen of the 2.sup.8 values the DAC can accept. The total number of red values has not increased, but they have been dispersed over the range of color resolution available. This would be desirable in systems that either didn't have very much frame buffer memory in the first place, or where not very many different red colors were wanted, and frame buffer memory was de-allocated from R value duty and used to advantage somewhere else.
From time to time the graphics system changes the color map. In conventional systems this is accomplished without any special concern for when it is done. This can cause display artifacts in two ways. First, the read activities of most RAM's are disturbed during write operations. This causes loss of the mapping action during the update, temporarily resulting in arbitrary colors in random locations. Second, some rather peculiar (albeit transient) colorizations can result if the color map is changed within the duration of a raster presentation; part of the screen would be mapped one way while part of the screen would be mapped another way. This is almost sure to be the case because of the difficulty in synchronizing the activities of the graphics system's CPU with raster generation by the monitor. In the preferred embodiment the color map and the overlay map are periodically updated from a shadow RAM during vertical retrace, whether or not the shadow RAM has been changed. The graphics system updates the shadow RAM in place of the conventional updates to the color and overlay map RAM's, which are then subsequently updated from the shadow RAM during vertical retrace.
In a further aspect of the invention the shadow RAM comprises first and second portions, each of which is large enough to update both the color map and the overlay RAM. After a certain number of frames of raster generation, (say, eight) the color map and overlay map are updated from the first portion. After eight more frames they are automatically updated from the second portion. After another eight they are again updated automatically from the first portion, and so on. Now suppose that the first and second portions contain certain symmetrically different information. A cursor in an overlay plane could be made to blink between any two colors by the change to the overlay RAM. Or, some object in the RGB planes could be made to blink by having alternating colors assigned to it by the first and second portions of the color map. If the first and second portions of the shadow RAM are made identical then no blinking is induced by the overlay or color maps.