The present invention relates generally to caching processed data, and, more particularly, to a method and apparatus for distributing the functions of cache to different stages in an execution pipeline such that the stages can operate independently while saving processing time and preventing data hazards.
The function of a 3D graphics processing pipeline for a computer is to represent a 3-dimensional scene on a 2-dimensional computer display screen. In order to change view-point within the 3-dimensional scene, the graphics processing pipeline must redraw the scene according to the new view-point; this can affect perspective, lighting, reflections, the obscuring or revealing of objects and so forth. This requires a large amount of calculation.
Consider the polygon shown in FIG. 1a. The symbols a, b, c, and d represent the vertices of the polygon. The characteristics of each vertex include such information as its position in 3-dimensional space (x,y,z Cartesian co-ordinates), its color, its transparency, and its texture.
When drawing a scene, the 3D graphics processor works with flat surfaces. The most complex surface that is guaranteed to be flat is a triangle. Therefore, part of the process of displaying the polygon abcd involves tessellating it into triangles. This process is shown in FIG. 1b. The polygon abcd is now represented by the two triangles abc and adc. For more complex shapes, tessellation can lead to the introduction of new vertices, as shown in the tessellation of FIG. 1c. 
The task of drawing the polygon is now replaced by the task of drawing the two triangles abc and adc. One way of representing the task to the 3D graphics pipeline is to produce a sequence of triplets representing the vertices to be drawn: abcadc. Each identifier a, b, c, and d must uniquely define a particular vertex within a scene, but a vertex can be shared with several triangles. In addition, the identifier allows the characteristics of the vertex (the position, color, transparency, etc.) to be retrieved from storage elsewhere in the system.
Part of the functions performed by a 3D-graphics processor are shown in FIG. 2. The sequence of triplets is passed to data producer 20. Producer 20 is responsible for translating a vertex identifier xe2x80x9caxe2x80x9d into the vertex characteristics xe2x80x9cAxe2x80x9d. xe2x80x9caxe2x80x9d represents a relatively small data item (for example, an 8-bit value) while xe2x80x9cAxe2x80x9d represents a number of relatively large data items (for example, 16, 32-bit values). The translation process is costly in terms of time and occupied system resources; for example, it might require a number of memory accesses and data conversion operations.
The vertex characteristics xe2x80x9cAxe2x80x9d are processed by a processor and the processed vertex xe2x80x9cAxe2x80x3xe2x80x9d is passed to the consumer (the next stage in the pipeline). The processing performed by the processor is costly in terms of time.
The vertices xe2x80x9caxe2x80x9d and xe2x80x9ccxe2x80x9d are each processed twice for polygon abcdxe2x80x94once for triangle abc and once for triangle acd. The result of processing a given vertex is identical, irrespective of the triangle it is being processed for. It wastes resources to translate the vertex identifier and process the vertex multiple times, so performance would be improved by maintaining a cache of transformed vertices.
The rate at which producer 20, processor 21 and consumer 22 handle data is different and may vary on a triangle-by-triangle or vertex-by-vertex basis. With the system shown in FIG. 2, the slowest unit determines the rate at any time, and faster units are stalled. This is an inefficient use of resources.
One way to more efficiently utilize the resources of a 3D-graphics pipeline is by using the cache as shown in FIG. 3. Producer 30 translates a vertex identifier xe2x80x9caxe2x80x9d and writes the translated vertex data xe2x80x9cAxe2x80x9d to data FIFO 31. Data FIFO 31 is a first in first out queue. Data is sent from producer 30 to data FIFO 31 where processor 32 can access it when ready. When producer 30 translates a vertex faster than processor 32 processes a vertex, multiple data units can be stored in data FIFO 31. Similarly, when processor 32 begins to processes data units at a faster rate than producer 30 translates vertices, processor 32, rather than stalling immediately, continues to read translated vertices from data FIFO 31 until it is empty.
Processor 32 maintains a cache tag storage 36 with tags containing the minimum amount of information Axe2x80x2, Bxe2x80x2, Cxe2x80x2, Dxe2x80x2 required to uniquely identify a data unit incoming to processor 32, i.e. data A, B, C, D. The minimum amount of information required to uniquely identify a data unit that is stored in cache tag storage 36 can be the data unit itself, a shortened form of the data unit, or an identifier derived from the data unit. For each element Axe2x80x2, Bxe2x80x2, Cxe2x80x2, Dxe2x80x2 of tag storage, there is a corresponding element of processed data Axe2x80x3, Bxe2x80x3, Cxe2x80x3, Dxe2x80x3 stored in cache data storage 37. Thus, cache data storage 37 contains processed data Axe2x80x3, Bxe2x80x3, Cxe2x80x3, Dxe2x80x3 corresponding to input data A, B, C, D previously processed by processor 32. The tags Axe2x80x2, Bxe2x80x2, Cxe2x80x2, Dxe2x80x2 must be stored as well as the data Axe2x80x3, Bxe2x80x3, Cxe2x80x3, Dxe2x80x3, so that the tags can be compared to the incoming data units before they are processed so the processor 32 can determine if there is cached data Axe2x80x3, Bxe2x80x3, Cxe2x80x3, Dxe2x80x3 corresponding to incoming data A, B, C, D. When processor 32 removes a data unit from data FIFO 31, it checks to see whether the data unit has a corresponding tag Axe2x80x2, Bxe2x80x2, Cxe2x80x2, Dxe2x80x2 currently in cache tag storage 36. If there is no corresponding tag in cache data storage 36, that is, processor 32 gets a cache xe2x80x9cmiss,xe2x80x9d processor 32 stores a tag for the incoming data unit from FIFO 31 in cache tag storage 36 by using a currently unused storage location or reallocating a location that currently holds an old tag in cache tag storage 36. Processor 32 also processes the incoming data unit and stores the newly processed data unit as cache data in the corresponding location in cache data storage 37. The processed data is also passed through multiplexor 33 under control of processor 32 to a FIFO 34 for processed data and from there to a data consumer 35. If processor 32 finds a cache tag in tag storage 36, that is, it gets a cache xe2x80x9chit,xe2x80x9d then processor 32 operates multiplexor 33 and cache data storage 37 so that the cache data, corresponding to the cache tag for which there was a xe2x80x9chit,xe2x80x9d is passed through FIFO 34 to consumer 35. Consumer 35 can then take each processed data unit from data FIFO 34 in the correct order. For a 3D graphics processing pipeline, processor 32 might transform vertex data xe2x80x9cAxe2x80x9d according to the point of view for which the scene is to be rendered and according to the way in which the scene is lit to produce processed data Axe2x80x3.
There are, however, still some problems with the method discussed above. While a cache xe2x80x9chitxe2x80x9d saves the latency associated with processor 32 having to reprocess previously processed data, it does not save the time taken by producer 30 to generate the stream ABCD from incoming data stream abcd or the time taken by processor 32 to check for cache hits and misses. In addition, the values Axe2x80x2Bxe2x80x2Cxe2x80x2Dxe2x80x2 are likely to be larger than abcd requiring greater storage capacity in cache tag storage 36.
Thus, there exists a desire and need for a system and method for more efficiently performing cache functions.
The present invention mitigates the problems associated with the prior art and provides a unique method and system of distributed cache.
In accordance with an exemplary embodiment of the present invention, a cache tag storage is provided which is maintained by a producer which sends data to the processor. A cache data storage is also provided which holds data output from a processor. A producer determines cache hits and misses without performing any translations on the data, and passes cache hit information through a control path to the consumer without any involvement from the processor. A cache miss causes the producer to fetch and translate data and pass that data to the processor for processing. The producer implements a cache replacement algorithm and manages the cache tag storage in accordance with the algorithm.
The consumer uses information from the control path to allow it to read data from the output of the processor and the cache storage in the correct data order. The consumer also manages the cache data storage, in accordance with a cache replacement algorithm compatible with that used by the producer, by writing certain values that it has read from the output of the processor into the cache storage.