1. Field of the Invention
The present invention relates to computer graphics systems and, more particularly, to a novel distributed memory structure in a computer graphics system.
2. Discussion of the Related Art
Computer graphics systems are commonly used for displaying graphical representations of objects on a two-dimensional video display screen. Current computer graphics display systems provide highly detailed representations and are used in a variety of applications. A computer graphics display system generally comprises a central processing unit (CPU), system memory, a graphics machine and a video display screen.
In typical computer graphics display systems, an object to be presented on the display screen is broken down into graphics primitives. Primitives are basic components of a graphics display and may include points, lines, vectors and polygons (e.g., triangles and quadrilaterals). Typically, a hardware/software scheme is implemented to render, or draw, the graphics primitives that represent a view of one or more objects being represented on the display screen.
Generally, the primitives of the three-dimensional object to be rendered are defined by the host CPU in terms of primitive data. For example, when the primitive is a triangle, the host computer may define the primitive in terms of the X, Y and Z coordinates of its vertices. Additional primitive data may be used in specific applications. Rendering hardware interpolates the primitive data to compute the display screen pixels that represent each primitive.
The graphics machine generally includes a geometry accelerator, a rasterizer, a frame buffer controller and a frame buffer. The graphics machine may also include texture mapping hardware. The geometry accelerator receives vertex data from the host CPU that defines the primitives that make up the view to be displayed. As is known, the operations of the geometry accelerator are computationally very intense. One frame of a three-dimensional (3-D) graphics display may include on the order of hundreds of thousands of primitives. To achieve state-of-the-art performance, the geometry accelerator may be required to perform several hundred million floating point calculations per second. Furthermore, the volume of data transferred between the host computer and the graphics hardware is very large. Additional data transmitted from the host computer to the geometry accelerator includes illumination parameters, clipping parameters and any other parameters needed to generate the graphics display.
As is known, a rasterizer receives data representing figures or objects and then provides the pixel-like representation of the figures. As is also known, texture mapping involves applying one or more point elements (texels) of a texture to each point element (pixel) of the displayed portion of the object to which the texture is being mapped. Texture mapping hardware is conventionally provided with information indicating the manner in which the texels in a texture map correspond to the pixels on the display screen that represent the object. Each texel in a texture map may be defined by S and T coordinates which identify its location in the two-dimensional texture map. For each pixel, the corresponding texel or texels that map to it are accessed from the texture map, and incorporated into the final R, G, B values generated for the pixel to represent the textured object on the display screen. As is known, in additional to two-dimensional texture maps, one dimensional, three dimensional, and even other dimensional texture maps are also known. In this respect, the two-dimensional texture map has been mentioned for illustrative purposes only.
It should be understood that each pixel in an object primitive may not map in one-to-one correspondence with a single texel in the texture map for every view of the object. For example, the closer the object is to the view port represented on the display screen, the larger the object will appear. As the object appears larger on the display screen, the representation of the texture becomes more detailed. Thus, when the object consumes a fairly large portion of the display screen, a large number of pixels is used to represent the object on the display screen, and each pixel that represents the object may map in one-to-one correspondence with a single texel in the texture map, or a single texel may map to multiple pixels. However, when the object takes up a relatively small portion of the display screen, a much smaller number of pixels is used to represent the object, resulting in the texture being represented with less detail, so that each pixel may map to multiple texels. Each pixel may also map to multiple texels when a texture is mapped to a small portion of an object. Resultant texel data is calculated for each pixel that maps to more than one texel, and typically represents an average of the texels that map to that pixel.
To more particularly illustrate a conventional graphics system, reference is made to FIG. 1, which is a diagram illustrating a graphics pipeline 10 as is known. It should be noted at the outset that there are a variety of alternative manners to illustrate the graphics pipeline 10 illustrated in FIG. 1, and that the diagram of FIG. 1 is presented for illustration only.
As illustrated, a host computer 20 typically communicates with the graphics hardware across a high-speed bus, such as an AGP (accelerated graphics port) bus or a PCI (peripheral component interconnect) bus. A host interface 22 is typically provided at the front end of the graphics hardware to interface with the high-speed bus. A format block 24 is provided downstream of the host interface 22. One or more geometry accelerators 26 and 27, and one or more rasterizers 30 and 31 are provided downstream of the format block 24. The operation of geometry accelerators and rasterizers, and methods/configurations for operating multiples of these components are known, and therefore need not be described herein.
Downstream of the rasterizers 30 and 31 are texture mapping hardware 34, a fragment processor 36, the Z-buffer 38, and blend hardware 40. The function and operation of each of these components are known and need not be described herein. As is known, however, texture mapping hardware systems typically include a local memory subsystem 50 that stores data representing a texture associated with the object being rendered.
Downstream of the texture mapping hardware are subsystems including display composition 52, display timing 54, digital to analog converter 56, and a display 58. As is known, the display composition hardware 52 processes different object/primitive layers to determine the color of a given pixel to be displayed.
Consistent with the general architecture and data flow of a graphics pipeline 10 like that of FIG. 1, prior art systems were known that provided multiple/parallel rasterizers and multiple/parallel texture mapping subsystems, wherein each of the rasterizers and texture mapping components communicated with a dedicated local memory.
Other systems are known that combine the rasterization hardware and texture mapping hardware. In such systems there may be multiple/parallel combined rasterization/texture mapping subsystems. Such systems typically had dedicated memory for each rasterizer/texture mapping component. As is known, such systems segmented and dedicated portions of the display to the different rasterizer/texture mapping components for processing. However, in order to process primitives extending across segment borders, systems implementing this type of parallelism often duplicated texture maps and other data across the separate memories, resulting in both capacity and bandwidth problems. Stated another way, the texture maps previously discussed cannot be isolated. Instead, texture data is duplicated in each isolated display segment or domain. The bandwidth problems were further compounded by the granularity in which data is retrieve from memory (i.e., burst reads).
Accordingly, it is desired to provide a graphics system that is scalable while at the same time effectively addresses the bandwidth demands and other shortcomings of prior art graphics systems.
Certain objects, advantages and novel features of the invention will be set forth in part in the description that follows and in part will become apparent to those skilled in the art upon examination of the following or may be learned with the practice of the invention. The objects and advantages of the invention may be realized and obtained by means of the instrumentalities and combinations particularly pointed out in the appended claims.
The present invention is broadly directed to a system of integrated circuit components. The system comprises a plurality of nodes that are interconnected by communication links. A random access memory (RAM) is connected to each node. At least one functional unit is integrated into each node, and each functional unit is configured to carry out a predetermined processing function. Finally, each RAM includes a coherency mechanism configured to permit only read access to the RAM by other nodes, the coherency mechanism further configured to permit write access to the RAM only by functional units that are local to the node.
In a preferred embodiment, the system broadly described above may be implemented in a graphics processing system. In such a system, nodes may include functional units such as geometry accelerators, rasterizers, tilers, shaders, etc. Also, it should be appreciated that nodes need not be coextensive with physical integrated circuit boundaries. That is, a single integrated circuit component may include a single node. Alternatively, however, a single integrated circuit component may include multiple nodes. Further, in the preferred embodiment, each RAM that is connected to a node is segmented such that specified segments of the RAM are allocated to specific functional units of the node. Write access to the RAM is preferably limited to the connected node, while read access may be allowed to all nodes in the system.