In current graphics subsystems, the speed and number of graphical processing elements has increased enough to make the graphics memory subsystem a barrier to achieving high performance. FIG. 1 illustrates a graphics memory controller 100 of the prior art. The memory controller 10 acts as a switch to determine which of several graphics processing clients 12, 14, 16, 18, 20 can access the memory storage array 22, which is organized as a single, i.e., monolithic, partition. Typical graphics processing elements that are the memory clients include the host processor, a texture engine, a z-buffer engine, a color engine, a 2D-graphics engine, a 3D-graphics engine and a display engine. Each client 12, 14, 16, 18, and 20 requests one or more cycles of the memory storage array 22 which transfers in each cycle a data quantity equal to the size of the data bus 15 of the array.
The size of the memory data bus 15 sets the size of the minimum access that may be made to the graphics memory subsystem. Monolithic memory subsystems for the various graphics clients have evolved to use a wider memory data bus for increased throughput. However, this leads to inefficient accesses for some of the graphics processing elements of the graphics subsystem that may not need to access data requiring the full size of data bus 15.
Memory buses in current architectures are now typically 128 bits physically, and 256 bits (32 bytes), effectively, when the minimum data transfer requires both phases of single clock cycle. (Hereinafter, when referring to the size of the data bus, the effective size, rather than physical size is meant.) This size of the memory data bus sets the size of the minimum access that may be made to the graphics memory subsystem.
For some devices that make use of the graphics memory, 32 bytes is an acceptable minimum. However, the conventional minimum size of 32 bytes is inefficient because there are memory accesses for some of the clients 12, 14, 16, 18, and 20 that do not require the full minimum memory access size. In particular, as geometry objects become smaller and finer, a minimum access of 32 bytes has more data transfer bandwidth than is needed by the various graphics engines used to process graphical objects. One measure of inefficiency of the access is the ratio of used pixels to fetched pixels. As the size of the memory bus increases or the minimum access increases, this ratio becomes smaller. A small ratio implies a large amount of wasted memory throughput in the graphics memory subsystem. It is desirable to avoid this wasted throughput without altering the view to the memory clients of memory as a single unit.
In one proposed solution to the problems of this wasted throughput, it has been suggested to provide a memory system that includes a plurality of memory partitions. Such a multiple partition memory system is described in detail in U.S. patent application Ser. No. 09/687,453, entitled “Controller For A Memory System Having Multiple Partitions,” commonly assigned to the assignee of the present invention, the contents of which are hereby incorporated by reference.
FIG. 2 shows a high-level block diagram of the system 200 proposed in U.S. patent application Ser. No. 09/687,453. A memory array 24 has a number of independently operable partitions 26a, 26b, 26c, 26d, each with a respective bus 28a, 28b, 28c, and 28d and a bus 27a, 27b, 27c, and 27d having a width w that is preferably a smaller transfer size than the single prior art bus 15 in FIG. 1. In one embodiment, there are four independent partitions P0, P1, P2, and P3 (elements 26a, 26b, 26c, 26d) each with a bus width one quarter the size of the non-partitioned bus, i.e., each with a 64 bit bus. Each of the memory system clients 12, 14, 16, 18, and 20 is connected to memory controller 30. Memory controller 30 includes a number of queues 32, 34, 36, 38, 40, 42, 44, 46 that connect to the partitions 26a, 26b, 26c, and 26d of memory array 24. Control logic (not shown in FIG. 2) determines the one or more partitions to which a request should be routed and the one or more partitions from which a response (read data) to a request should be obtained to maintain the appearance of a non-partitioned memory for the clients. Additionally, the control logic in the memory controller arbitrates among the various clients according to a priority assigned to each of the clients.
A drawback of the partitioned memory system proposed in U.S. patent application Ser. No. 09/687,453 is that the hardware requirements are larger than desired. Monolithic memory subsystems for the various graphics clients are evolving to use an increasingly larger memory data bus for increased throughput. For example, there is a trend in the graphics industry to increase the burst transfer length (BL) (e.g., from BL 4 to BL 8), which has the effect of increasing the minimum access size. Moreover, as described in the embodiment in U.S. patent application Ser. No. 09/687,453, each memory partition has its own hardware for controlling read and write operations to that partition in a coordinated fashion with the read/write operations to the remaining partitions. Thus, for each new partition added in the system the hardware implementation requirements increase as well. It would therefore be beneficial to have a partitioned memory system providing efficient memory access while not requiring a large increase in the hardware to implement a virtually unified memory architecture with high throughput.