The present invention pertains to a method and apparatus for reordering memory requests. More particularly, the present invention pertains to a method of improving the performance of graphic texture memory fetches through the implementation of a reordering device.
As is known in the art, the system cache in a computer system serves to enhance the system performance of modern computers. For example, in an integrated graphics chipset, the cache can maintain data between several clients and the relatively slower system memory by holding recently accessed memory locations in case they are needed again. The cache can maintain data for several clients, such as the central processing unit (CPU) or the graphics texture engine.
A 3-D graphics texture engine requires large amounts of texture data. Utilizing the cache, some of the needed texture data can be retrieved as a result of a “cache hit.” A cache hit is a request to read from memory that may be satisfied from the cache without using main (or another) memory.
Within some integrated chipset designs, the cache can service a client working on several data streams at any one time. In particular, a 3-D graphics texture engine requires constant accesses to various data streams to utilize a technique known as MIP (Multum In Parvo, Latin for “many things in a small place”) mapping. The graphics texture fetches to the cache occur in a round-robin fashion, such that sets of requests to a particular stream are ungrouped (i.e. the requests from one stream become intermixed with those requests from other streams).
Furthermore, the various data streams are frequently found in separate areas of physical memory (i.e. each data stream is found in a separate memory “page”). In the event of various “cache misses” amongst various data streams, the requests are sent out to memory to be fulfilled. As mentioned above, these requests to memory are inherently out of order. Because these requests from different streams become intermixed, a certain amount of latency results from the resulting page “breaks.” These page breaks occur when consecutive requests are from different data streams, requiring accesses to different memory pages. This requires opening and closing one page, and then, opening and closing another memory page, inducing latency.
When several separate streams of data are requested by a client, page coherency between requests diminishes. As the number of page breaks expands as a result of lost page coherency, the amount of latency increases, thereby reducing overall system performance.
In view of the above, there is a need for a method and apparatus for reordering memory requests for page coherency of client data requests in an integrated graphics chipset environment.