The present invention relates to a computer system, and more particularly, to a computer system having a memory module with a memory hub coupling several memory devices to a processor or other memory access devices.
Computer systems use memory devices, such as dynamic random access memory (xe2x80x9cDRAMxe2x80x9d) devices, to store instructions and data that are accessed by a processor. These memory devices are normally used as system memory in a computer system. In a typical computer system, the processor communicates with the system memory through a processor bus and a memory controller. The processor issues a memory request, which includes a memory command, such as a read command, and an address designating the location from which data or instructions are to be read. The memory controller uses the command and address to generate appropriate command signals as well as row and column addresses, which are applied to the system memory. In response to the commands and addresses, data is transferred between the system memory and the processor. The memory controller is often part of a system controller, which also includes bus bridge circuitry for coupling the processor bus to an expansion bus, such as a PCI bus.
Although the operating speed of memory devices has continuously increased, this increase in operating speed has not kept pace with increases in the operating speed of processors. Even slower has been the increase in operating speed of memory controllers coupling processors to memory devices. The relatively slow speed of memory controllers and memory devices limits the data bandwidth between the processor and the memory devices.
In addition to the limited bandwidth between processors and memory devices, the performance of computer systems is also limited by latency problems that increase the time required to read data from system memory devices. More specifically, when a memory device read command is coupled to a system memory device, such as a synchronous DRAM (xe2x80x9cSDRAMxe2x80x9d) device, the read data are output from the SDRAM device only after a delay of several clock periods. Therefore, although SDRAM devices can synchronously output burst data at a high data rate, the delay in initially providing the data can significantly slow the operating speed of a computer system using such SDRAM devices.
One approach to alleviating the memory latency problem is to use multiple memory devices coupled to the processor through a memory hub. In a memory hub architecture, a system controller or memory hub controller is coupled to several memory modules, each of which includes a memory hub coupled to several memory devices. The memory hub efficiently routes memory requests and responses between the controller and the memory devices. Computer systems employing this architecture can have a higher bandwidth because a processor can access one memory device while another memory device is responding to a prior memory access. For example, the processor can output write data to one of the memory devices in the system while another memory device in the system is preparing to provide read data to the processor. The operating efficiency of computer systems using a memory hub architecture can make it more practical to vastly increase memory capacity in computer systems.
Despite the advantages of utilizing a memory hub for accessing memory devices, the semiconductor technology used by memory devices often results in defective memory locations, which make the memory devices unreliable. The degree to which defective locations in a memory device impairs the performance of a computer system using such a device depends on the nature of the computer system and the application it is performing. Computer systems may vary from simple computers, such as those contained in telephone answering machines, to highly complex supercomputers employed for complicated scientific projects. In simple computers used for telephone answering machines, for example, errors in one or more of the memory locations of the memory may not be fatal. For example, a mistake in the memory of the telephone answering machine likely would only cause the synthesized voice stored on the memory to be imperceptibly altered. However, one or more defective memory locations in the memory of a computer used to perform scientific calculations may cause substantial problems.
Although current manufacturing techniques have substantially reduced the number of defective memory locations, computer memory is still susceptible to such defective memory locations. Those defective memory locations can be caused by any of numerous steps taken during manufacture of the memory chips, semiconductor crystalinity defects, electrical connector discontinuities, etc. Although memory chips with such defective memory locations typically represent a small portion (less than 1%) of the total number of memory chips produced, the actual number of such defective memory chips is substantial.
In the past, extra rows of memory cells, known as xe2x80x9credundant rowsxe2x80x9d were provided to replace rows having defective memory cells. While the use of redundant rows is successful in salvaging otherwise defective memory chips, the number of defective rows that can be replaced is limited to the number of redundant rows that are provided on the memory chip. The number of defective rows sometimes exceeds the number of available redundant rows, thus preventing repair of some defective rows. In some cases, such defective memory chips could be sold at a greatly reduced price for applications that do not require perfect memory, such as for telephone answering machines. However, it could be beneficial if some of those memory chips could be employed in more critical applications, such as in personal computers.
One way to enable such defective memory chips to be incorporated into personal computers is to employ error correction schemes to compensate for defective memory locations. Error correction schemes add to each data word plural error correction bits that enable the data word to be reconstituted in the event of an erroneous data bit within the data word. However, such prior art error correction schemes typically only reconstitute a data word if only a single bit of the data word is erroneous. Moreover, such error correction schemes add several extra data bits to each data word which results in high memory overhead. In addition, such error correction schemes could be extended to detect multiple erroneous data bits, but the memory overhead that would result likely would be unacceptable.
Another method of correcting defective memory bits is through a commonly known remapping scheme. Remapping schemes utilize a predefined error map and remapping table to redirect defective memory locations. The error map is usually created in the factory based on well-known tests that determine which memory locations of the memory block are defective. Although these remapping schemes address double bit error problems and high memory overhead, they present various downfalls. For example, creating the error map at the factory does not allow future defective locations to be corrected and adds additional time and money to the manufacturing process. Creating the error map in the system controller requires each computer manufacturer to develop unique testing systems for each type of memory device accessed by the computer system.
Regardless of the type of memory repair or correction technique that is used, it is generally necessary to detect the location of defective memory cells. Defective memory cells are commonly detected by writing a pattern of known data, such as a checkerboard pattern of 1s and 0s, to an array of memory cells, and then reading data from the memory cells to determine if the read data match the write data. Testing memory devices in this manner is normally performed at several stages during the manufacture of the memory devices and by a computer or other system using the memory devices. For example, a computer system normally tests system memory devices, which are normally dynamic random access (xe2x80x9cDRAMxe2x80x9d) memory devices, at power-up of the computer system.
The time required to test memory devices by writing known data to the memory devices, reading data from the memory devices, and comparing the read data to the write data is largely a function of the storage capacity of the memory devices. For example, doubling the number of memory cells in a memory device normally doubles the time to test the memory device. While the time required to test memory devices used in conventional memory architectures may be acceptably short, the time required to test memory devices using other architectures can be unacceptably long. For example, the vast memory capacity that a memory hub architecture can provide can result in an unacceptably long period of time for a processor to test the memory devices in the memory hub architecture system.
One approach to decreasing the time required to test memory devices by comparing read data to write data is to move the memory testing function xe2x80x9con chipxe2x80x9d by incorporating self-test circuits in memory devices. Although this approach can reduce the time required to test memory devices, the pass/fail status of each memory device must nevertheless be reported to a processor or other memory access device. In a memory hub architecture using a large number of memory devices, it may require a substantial period of time for all of the memory devices to report their pass/fail status.
There is therefore a need for memory module that combines the advantages of a memory hub architecture with the advantages of testing and repairing memory devices on the memory module.
The present invention is directed to a computer system and method for testing and repairing defective memory locations of memory devices located on a memory module. The computer system includes a plurality of memory modules coupled to a memory hub controller. Each of the memory modules includes a plurality of memory devices and a memory hub. The memory hub comprises a self-test module and a repair module. The self-test module is coupled to the memory devices, and in response to a request to test the memory devices, the self-test module executes one or more self-test routines. The self-test routines determine the locations of defective memory on the memory devices. The repair module uses the locations of defective memory to create a remapping table. The remapping table redirects the defective memory locations of the memory devices to non-defective memory locations of memory located on the memory module, such as in the memory devices, or in cache memory or scratch memory located within the memory hub. Thus, each time the memory hub receives a memory request from one of the memory access devices, such as the computer processor, the memory hub utilizes the repair module to check the memory location for defective memory and if necessary, redirect the memory request to a non-defective location.
As will be apparent, the invention is capable of modifications in various obvious aspects, all without departing from the spirit and scope of the present invention. Accordingly, the drawings and detailed description are to be regarded as illustrative in nature and not restrictive.