Computer systems use memory devices, such as dynamic random access memory (“DRAM”) devices, to store instructions and data that are accessed by a processor. These memory devices are normally used as system memory in a computer system. In a typical computer system, the processor communicates with the system memory through a processor bus and a memory controller. The processor issues a memory request, which includes a memory command, such as a read command, and an address designating the location from which data or instructions are to be read or to which data or instructions are to be written. The memory controller uses the command and address to generate appropriate command signals as well as row and column addresses, which are applied to the system memory. In response to the commands and addresses, data are transferred between the system memory and the processor. The memory controller is often part of a system controller, which also includes bus bridge circuitry for coupling the processor bus to an expansion bus, such as a PCI bus.
Although the operating speed of memory devices has continuously increased, this increase in operating speed has not kept pace with increases in the operating speed of processors. As a result, the data bandwidth between a processor and memory devices to which it is coupled is significantly lower than the data bandwidth capabilities of the processor. The data bandwidth between the processor and memory devices is limited to a greater degree by the even lower data bandwidth between the processor and the memory devices.
In addition to the limited bandwidth between processors and memory devices, the performance of computer systems is also limited by latency problems that increase the time required to read data from the memory devices. More specifically, when a memory device read command is coupled to a memory device, such as a synchronous DRAM (“SDRAM”) device, the read data are output from the SDRAM device only after a delay of several clock periods. Therefore, although SDRAM devices can synchronously output burst data at a high data rate, the delay in initially providing the data can significantly slow the operating speed of a computer system using such SDRAM devices.
One approach to alleviating the memory latency problem is illustrated in FIG. 1. As shown in FIG. 1, a computer system 10 includes a processor 14 coupled to several memory modules 20a-f, although a lesser or greater number of memory modules 20 may be used. Each of the memory modules 20 includes a memory hub 24 coupled to several memory devices 28, which may be SDRAM devices. The memory modules 20 are shown in FIG. 1 as being coupled to the processor 14 and to each other 20 through unidirectional input buses 30 and unidirectional output buses 38. However, it will be understood that the memory modules 20 may be coupled to the processor 14 and to each other by bi-directional buses (not shown).
The memory modules 20 are shown in FIG. 1 as being coupled in a point-to-point arrangement in which each bus 30, 38 is coupled only between two points. However, other bus system may alternatively be used. For example, a switched bus system as shown in FIG. 2A, a shared bus system as shown in FIG. 2B, or some other bus system may also be used. The switched bus system shown in FIG. 2A includes a processor 40 coupled to a switching circuit 42. The switching circuit 42 is coupled to several memory modules 44a-d, a graphics processor 46 and an I/O device 48. In operation, the switching circuit 42 couples the processor 40 to either one of the memory modules 44a-d, the graphics processor 46 or the I/O device 48. The shared bus system shown in FIG. 2B includes a processor 50 coupled to several memory modules 54a-c through a shared bus system 58.
Any of the above-described architectures may also be used to couple multiple processors to multiple memory modules. For example, as shown in FIG. 3, a pair of a processors 60, 62 are coupled through respective bi-directional bus systems 64 to respective sets of memory modules 66a-e, 68a-e. Each of the memory modules 66a-e, 68a-e includes a memory hub 24 coupled to several memory devices 28.
A memory hub architecture as shown in FIGS. 1 and 3 can provide performance that is far superior to architectures in which a processor is coupled to several memory devices, either directly or through a system or memory controller. However, they nevertheless suffer from several limitations. For example, the architecture shown in FIG. 1 does not provide a great deal of flexibility in the manner in which the processor 14 can access the memory modules 20a-f. If, for example, the buses 30-38 include a 32-bit data bus, all accesses to the memory modules 20a-f will be in 32-bit double words even if a lesser number a data bits are being read from or written to the memory modules 20a-f. 
The flexibility of the architectures shown in FIGS. 1 and 3 are also limited in other respects. For example, the architecture shown in FIG. 3 does not provide a great deal of flexibility in the manner in which the processors 60, 62 can access the memory modules 66a-e, 68a-e, respectively. Although the processor 60 can access any of the memory modules 66a-f, and the processor 62 can access any of the memory modules 68a-e, the processor 60 cannot access any of the memory modules 68a-e nor can the processor 62 access any of the memory modules 66a-e. As a result, if the processor 60 writes sufficient data to the memory modules 66a-e to reach the storage capacity of the modules 66a-e, the processor 60 will be unable to store any further data even though there may be substantial unused capacity in the memory modules 68a-e. Finally, the memory modules 66, 68 cannot be used to allow the processors 60, 62 to communicate with each other.
Another limitation of the memory architectures shown in FIGS. 1 and 3 is the relatively high latency that the processors 14, 60, 62 incur in accessing their respective memory modules 20, 66, 68. Insofar as each memory module is accessed through any memory module that is between it and the processor, substantial delays may be incurred in coupling address, data and control signals through the intervening memory modules. Further, if any of the memory modules 20, 66, 68 becomes defective, the memory modules that must be accessed through the defective memory module become unusable.
There is therefore a need for a memory system architecture that is relatively fault-intolerant, that provides relatively low latency memory accesses, an that allows multiple processor to have a great deal of flexibility in the manner in which they access hub-based memory modules.