Various systems are available for storing data in memory devices and retrieving data from those memory devices. FIG. 1 illustrates a computer architecture 100 in which a discrete graphics controller supports a local graphics memory. Computer architecture 100 includes a central processing unit (CPU) 102 coupled to a memory controller 104. Memory controller 104 is coupled to a main memory 108, an I/O controller 106, and a graphics controller 110. The main memory is used to store program instructions which are executed by the CPU and data which are referenced during the execution of these programs. The graphics controller 110 is coupled to a discrete graphics memory 112. Graphics controller 110 receives video data through a video input 114 and transmits video data to other devices through a display interface 116.
The architecture of FIG. 1 includes two separate memories (main memory 108 and graphics memory 112), each controlled by a different controller (memory controller 104 and graphics controller 110, respectively). Typically, graphics memory 112 includes faster and more expensive memory devices, while main memory 108 has a larger storage capacity, but uses slower, less expensive memory devices.
Improvements in integrated circuit design and manufacturing technologies allow higher levels of integration, thereby allowing an increasing number of subsystems to be integrated into a single device. This increased integration reduces the total number of components in a system, such as a computer system. As subsystems with high memory performance requirements (such as graphics subsystems) are combined with the traditional main memory controller, the resulting architecture may provide a single high-performance main memory interface.
Another type of computer memory architecture is referred to as a unified memory architecture (UMA). In a UMA system, the graphics memory is statically or dynamically partitioned off from the main memory pool, thereby saving the cost associated with dedicated graphics memory. UMA systems often employ less total memory capacity than systems using discrete graphics memory to achieve similar levels of graphics performance. UMA systems typically realize additional cost savings due to the higher levels of integration between the memory controller and the graphics controller.
FIG. 2 illustrates another prior art memory system 200 that uses a unified memory architecture. A CPU/Memory Controller subsystem 202 includes a CPU 208 and a memory controller and a graphics controller combined into a single device 210. The subsystem 202 represents an increased level of integration as compared to the architecture of FIG. 1. Subsystem 202 is coupled to a shared memory 204, which is used as both the main memory and the graphics memory. Subsystem 202 is also coupled to an I/O controller 206, a video input 212, and a display interface 214.
The memory controller/graphics controller 210 controls all memory access, both for data stored in the main memory portion of shared memory 204 and for data stored in the graphics memory portion of the shared memory. The shared memory 204 may be partitioned statically or dynamically. A static partition allocates a fixed portion of the shared memory 204 as “main memory” and the remaining portion is the “graphics memory.” A dynamic partition allows the allocation of shared memory 204 between main memory and graphics memory to change depending on the needs of the system. For example, if the graphics memory portion is full, and the graphics controller needs additional memory, the graphics memory portion may be expanded if a portion of the shared memory 204 is not currently in use or if the main memory allocation can be reduced.
Regardless of the system architecture, graphics rendering performance is often constrained by the memory bandwidth available to the graphics subsystem. In the system of FIG. 1, graphics controller 110 interfaces to a dedicated graphics memory 112. Cost constraints for the graphics subsystem generally dictate that a limited capacity of dedicated graphics memory 112 must be used. This limited amount of memory, in turn, dictates a maximum number of memory devices that can be supported. In such a memory system, the maximum graphics memory bandwidth is the product of the number of memory devices and the bandwidth of each memory device. Device-level cost constraints and technology limitations typically set the maximum memory device bandwidth. Consequently, graphics memory bandwidth, and therefore graphics performance, are generally bound by the small number of devices that can reasonably be supported in this type of system configuration.
Unified memory architectures such as that shown in FIG. 2, help alleviate cost constraints as described above, and generally provide lower cost relative to systems such as that shown in FIG. 1. However, memory bandwidth for the system of FIG. 2 is generally bound by cost constraints on the memory controller/graphics controller 210. Peak memory bandwidth for this system is the product of the number of conductors on the memory data interface and the communication bandwidth per conductor. The communication bandwidth per conductor is often limited by the choice of memory technology and the topology of the main memory interconnect. The number of conductors that can be used is generally bound by cost constraints on the memory controller/graphics controller package or system board design. However, the system of FIG. 2 allows theoretical aggregate bandwidth to and from the memory devices to scale linearly with system memory capacity, which is typically much larger than the capacity of dedicated graphics memory. The problem is that this aggregate bandwidth cannot be exploited due to the limiting factors described above relating to bandwidth limitations at the memory controller/graphics controller.
A system architecture which could offer the cost savings advantages of a unified memory architecture, while providing scalability options to higher levels of aggregate memory bandwidth (and therefore graphics performance) relative to systems using dedicated graphics memory would be advantageous.