The present invention relates to a scaleable network based computer system having a distributed texture memory architecture.
Today, computers are used in many different applications. One application suited for computers is that of generating three-dimensional graphics. Computer-generated 3-D graphics is used in business, science, animation, simulation, computer-aided design, process control, electronic publication, etc. In an effort to portray a more realistic real-world representation, three dimensional objects are transformed into models having the illusion of depth for display onto a two-dimensional computer screen. This is accomplished by using a number of polygons to represent a three-dimensional object. Complex three-dimensional objects may require upwards of hundreds or thousands of polygons in order to form an accurate model. Hence, a three-dimensional object can be readily manipulated (e.g., displayed in a different location, rotated, scaled, etc.) by processing the individual respective polygons corresponding to that object. Next, a scan conversion process is used to determine which pixels of a computer display fall within each of the specified polygons. Thereupon, texture is applied to only those pixels residing within specified polygons. In addition, hidden or obscured surfaces, which are normally not visible, are eliminated from view. Hence, displaying a three-dimensional object on a computer system is a rather complicated task and can require a tremendous amount of processing power.
This is especially true for those cases involving dynamic computer graphics for displaying three-dimensional objects that are in motion. In order to simulate smooth motion, the computer system should have a frame rate of at least 30 hertz. In other words, new images should be updated, redrawn and displayed at least thirty times a second. This imposes a heavy processing and computational burden on the computer system. Indeed, even more processing power is required for interactive computer graphics, where displayed images change in response to a user input and where there are multiple objects in a richly detailed scene. Each additional object that is added into a scene, needs to be modeled, scan converted, textured, Z-buffered for depth, etc., all of which, adds to the amount of processing resources that is required. In addition, it would be highly preferable if lighting, shadowing, shading, and fog could be included as part of the 3-D scene. Generating these special effects, again, consumes valuable processing resources. Moreover, the xe2x80x9cricherxe2x80x9d and more realistic a scene becomes, the more processing power that is required to render that scene. Even though the processing power of computer systems continues to improve, there is a demand for even faster, cheaper, and more powerful computer systems.
xe2x80x9cPipeliningxe2x80x9d is a common approach used for improving the overall performance of a computer system. In a pipelined architecture, a series of interconnected stages are used to render an image. Each stage performs a unique task during each clock cycle. For example, one stage might be used to scan-convert a pixel; a subsequent stage may be used for color conversion; another stage could be used to perform depth comparisons; this is followed by a texture stage for texturing; etc. In practice, it would take several pipeline stages to implement one of the previous exemplary blocks. The advantage of using a pipelined architecture is that as soon as one stage has completed its task on a pixel, that stage can immediately proceed to work on the next pixel. It does not have to wait for the processing of a prior pixel to complete before it can begin processing the current pixel. Thereby, pixels can flow through the pipeline at a rapid rate. By analogy, a pipelined architecture is similar to a fire brigade whereby a bucket is passed from one person to another down the line.
There are limits to how many pipeline stages a task may be broken down to increase its performance. Eventually a point is reached when the adding of additional pipeline stages to a task no longer increases performance due to the overhead associated with pipelining. In order to increase performance over a single pipeline, several pipelines can be connected together in a parallel. This technique is referred to parallel-pipelined approach.
There are, however, several disadvantages with using a parallel pipelined approach. One drawback to using a parallel pipelined architecture is that because each of the pipelines operate independently from the other pipelines, each pipeline must have access to its own set of texture data. This is especially the case when several pipelines perform parallel processing together in order to generate a single frame""s worth of data. As a result, duplicate copies of texture data must be maintained. In other words, the same set of texture data must be replicated for each of the different pipelines. Furthermore, some computer vendors offer the option of adding extra plug-in cards to increase a computer""s performance. Again, these cards operate independently of each other. And because they cannot communicate amongst themselves, each card must have its own dedicated memory. Rather than sharing data between cards, entire data sets are duplicated per each individual card.
This duplication is expensive in terms of the amount of memory chips which are required to store the duplicate information. Many applications today require extremely large texture maps. Although prices for memory chips have been falling, storing the entire texture map in dynamic random access memory chips is prohibitively expensive, especially if numerous duplicate copies of the texture map must be maintained. Moreover, textures exhibiting higher resolutions consume that much more memory. In addition, the same texture map is often stored at different levels of details. Due to the extremely large memory requirements, computer manufacturers have taken to storing entire texture maps on disk. Pieces of the texture map are then loaded into memory chips on an as needed basis. However, disk I/O operations are extremely slow. Thereby, computer designers face a dilemma: either limit the amount of texture data which can be stored and suffer visually inferior graphics or store texture data on disk and suffer much slower graphics display.
The aforementioned problems can be solved by storing the texture data within a distributed texture memory that is accessible by any and all rasterization circuits. Because the texture data can be shared by multiple rasterization circuits, only a single copy of the texture memory need be maintained within the computer system.
However, one problem associated with such a distributed memory architecture is that distributed texture memories are not easily cacheable. A typical cache matches requests against its contents and faults when the desired piece of data is not there. It then requests this data (or typically a block of data containing the desired data). The cache is stalled until the data is available. In a system with a distributed texture memory architecture, stalling is unacceptable since a stall would last on the order of 100 clocks. Therefore, what is needed is a cache memory that can avoid the majority of stalling on cache misses.
Another problem is that typical caches expect requested data to return in the order requested. This restriction, however, is not possible in a system with a distributed texture memory architecture because texture data may return in a different order from which they are requested. Therefore, what is also needed is a cache memory that can handle out-of-order return of texture data.
Accordingly, the present invention provides a cache memory for high latency and out-of-order return of texture data. The present invention includes a texture cache memory that is capable of working efficiently in computer systems where there is a long latency from the time the texture data is requested and the time the texture data is available for use. In addition, the present invention is capable of handling texture responses which enter into the texture cache memory in a different order from which they were requested. The present invention significantly improves performance of a computer system having a distributed texture memory architecture.
In the currently preferred embodiment, the present invention is practiced within a computer system having an internal transmission network which is used to transmit packets between a host processor and a number of subsystems. Three basic types of subsystems are coupled to the network: a geometry subsystem is used to process primitives; a rasterization subsystem is used to render pixels; and a display subsystem is used to drive a computer monitor. Texture and/or frame buffer data is stored in memory chips associated with the rasterization subsystems. A rasterization subsystem can access texture data from its associated memory chips or can request texture data residing within any of the other memory chips. A texture request is sent over the internal network; the requested texture data is packetized and then sent over the internal network to the requesting rasterization subsystem.
Significantly, in the currently preferred embodiment, the rasterization subsystem includes a texture cache memory that caches the distributed texture data. The rasterization subsystem conceals, or minimizes, latency in the transmission network by prefetching the distributed texture data and storing the prefetched texture data within the texture cache memory. The rasterization subsystem further includes a cache address queue that stores cache addresses according to the order in which the texture requests are sent. A texture filter of the rasterization subsystem is coupled to the address queue to receive the cache addresses. The texture filter, upon receiving the cache addresses, retrieves the prefetched texture data from the texture cache memory. In this way, the prefetched texture data can be retrieved independently of the order in which the texture data enters the texture cache memory.
Embodiments of the present invention include the above and further include a method of rendering pixels with texture data stored in distributed texture memories. The method includes the steps of: receiving texture memory addresses that correspond to cache addresses of a texture cache memory; sending texture requests for addresses not currently in the cache to the distributed texture memories; receiving texture responses from the distributed texture memories and storing the texture responses within the texture cache memory; and retrieving the texture responses from the texture cache memory according to the order in which the texture addresses are sent and independent of the order in which the texture responses enter the texture cache memory.