This invention relates generally to computer memory, and more particularly to providing collision detection in a memory system.
Contemporary high performance computing main memory systems are generally composed of one or more dynamic random access memory (DRAM) devices, which are connected to one or more processors via one or more memory control elements. Overall computer system performance is affected by each of the key elements of the computer structure, including the performance/structure of the processor(s), any memory cache(s), the input/output (I/O) subsystem(s), the efficiency of the memory control function(s), the main memory device(s), and the type and structure of the memory interconnect interface(s).
Extensive research and development efforts are invested by the industry, on an ongoing basis, to create improved and/or innovative solutions to maximizing overall system performance and density by improving the memory system/subsystem design and/or structure. High-availability systems present further challenges as related to overall system reliability due to customer expectations that new computer systems will markedly surpass existing systems in regard to mean-time-between-failure (MTBF), in addition to offering additional functions, increased performance, increased storage, lower operating costs, etc. Other frequent customer requirements further exacerbate the memory system design challenges, and include such items as ease of upgrade and reduced system environmental impact (such as space, power and cooling).
FIG. 1 relates to U.S. Pat. No. 5,513,135 to Dell et al., of common assignment herewith, and depicts an early synchronous memory module. The memory module depicted in FIG. 1 is a dual in-line memory module (DIMM). This module is composed of synchronous DRAMs 8, buffer devices 12, an optimized pinout, and an interconnect and capacitive decoupling method to facilitate high performance operation. The patent also describes the use of clock re-drive on the module, using such devices as phase-locked loops (PLLs).
FIG. 2 relates to U.S. Pat. No. 6,173,382 to Dell et al., of common assignment herewith, and depicts a computer system 10 which includes a synchronous memory module 20 that is directly (i.e. point-to-point) connected to a memory controller 14 via a bus 40, and which further includes logic circuitry 24 (such as an application specific integrated circuit, or “ASIC”) that buffers, registers or otherwise acts on the address, data and control information that is received from the memory controller 14. The memory module 20 can be programmed to operate in a plurality of selectable or programmable modes by way of an independent bus, such as an inter-integrated circuit (I2C) control bus 34, either as part of the memory initialization process or during normal operation. When utilized in applications requiring more than a single memory module connected directly to a memory controller, the patent notes that the resulting stubs can be minimized through the use of field-effect transistor (FET) switches to electrically disconnect modules from the bus.
Relative to U.S. Pat. No. 5,513,135, U.S. Pat. No. 6,173,382 further demonstrates the capability of integrating all of the defined functions (address, command, data, presence detect, etc) into a single device. The integration of functions is a common industry practice that is enabled by technology improvements and, in this case, enables additional module density and/or functionality.
FIG. 3, from U.S. Pat. No. 6,510,100 to Grundon et al., of common assignment herewith, depicts a simplified diagram and description of a memory system 10 that includes up to four registered DIMMs 40 on a traditional multi-drop stub bus. The subsystem includes a memory controller 20, an external clock buffer 30, registered DIMMs 40, an address bus 50, a control bus 60 and a data bus 70 with terminators 95 on the address bus 50 and the data bus 70. Although only a single memory channel is shown in FIG. 3, systems produced with these modules often included more than one discrete memory channel from the memory controller, with each of the memory channels operated singly (when a single channel was populated with modules) or in parallel (when two or more channels where populated with modules) to achieve the desired system functionality and/or performance.
FIG. 4, from U.S. Pat. No. 6,587,912 to Bonella et al., depicts a synchronous memory module 210 and system structure in which the repeater hubs 320 include local re-drive of the address, command and data to the local memory devices 301 and 302 via buses 321 and 322; generation of a local clock (as described in other figures and the patent text); and the re-driving of the appropriate memory interface signals to the next module or component in the system via bus 300.
FIG. 5 depicts a contemporary system composed of an integrated processor chip 500, which contains one or more processor elements and an integrated memory controller 510. In the configuration depicted in FIG. 5, multiple independent cascade interconnected memory interface busses 506 are logically aggregated together to operate in unison to support a single independent access request at a higher bandwidth with data and error detection/correction information distributed or “striped” across the parallel busses and associated devices. The memory controller 510 attaches to four narrow/high speed point-to-point memory busses 506, with each bus 506 connecting one of the several unique memory controller interface channels to a cascade interconnect memory subsystem 503 (or memory module) which includes at least a hub device 504 and one or more memory devices 509. Some systems further enable operations when a subset of the memory busses 506 are populated with memory subsystems 503. In this case, the one or more populated memory busses 508 may operate in unison to support a single access request.
FIG. 6 depicts a memory structure with cascaded memory modules 503 and unidirectional busses 506. One of the functions provided by the hub devices 504 in the memory modules 503 in the cascade structure is a re-drive function to send signals on the unidirectional busses 506 to other memory modules 503 or to the memory controller 510. FIG. 6 includes the memory controller 510 and four memory modules 503, on each of two memory busses 506 (a downstream memory bus with 24 wires and an upstream memory bus with 25 wires), connected to the memory controller 510 in either a direct or cascaded manner. The memory module 503 next to the memory controller 510 is connected to the memory controller 510 in a direct manner. The other memory modules 503 are connected to the memory controller 510 in a cascaded manner. Although not shown in this figure, the memory controller 510 may be integrated in the processor 500 and may connect to more than one memory bus 506 as depicted in FIG. 5.
FIG. 7 depicts a block diagram of a memory hub device 504 including a link interface 704 for providing the means to re-synchronize, translate and re-drive high speed memory access information to associated DRAM devices 509 and/or to re-drive the information downstream on memory bus 506 as applicable based on the memory system protocol. The information is received by the link interface 704 from an upstream memory hub device 504 or from a memory controller 510 (directly or via an upstream memory hub device 504) via the memory bus 506. The memory device data interface 715 manages the technology-specific data interface with the memory devices 509 and controls the bi-directional memory data bus 708. The memory hub control 713 responds to access request packets by responsively driving the memory device 509 technology-specific address and control bus 714 (for memory devices in RANK0 501) or address and control bus 714′ (for memory devices in RANK1 716) and directing the read data flow 707 and write data flow 710 selectors.
The link interface 704 in FIG. 7 decodes the packets and directs the address and command information directed to the local hub device 504 to the memory hub control 713. Memory write data from the link interface 704 can be temporarily stored in the write data queue 711 or directly driven to the memory devices 509 via the write data flow selector 710 and internal bus 712, and then sent via internal bus 709 and memory device data interface 715 to memory device data bus 708. Memory read data from memory device(s) 509 can be queued in the read data queue 706 or directly transferred to the link interface 704 via internal bus 705 and read data selector 707, to be transmitted on the upstream bus 506 as a read reply packet.
During the operation of a memory system, the memory controller is responsible for scheduling all memory accesses and other operations such that resource conflicts do not arise in memory system elements such as the interface bus(es), the interface logic (e.g., the hub device) and/or the memory devices themselves. At the same time, overall system performance optimization is best achieved by efficiently utilizing available resources by minimizing the idle and/or standby time associated with each of these system elements. Problems arise when the controller improperly schedules these resources, for example, by scheduling memory commands which will result in an invalid memory operation sequence or interface timing, by not allowing sufficient time for an interface bus to settle after an information transfer, and/or by scheduling a sequence of operations that result in excessive device temperature excursions. Each of these examples would likely result in a memory system failure, which may appear to the computer system and/or operator to be intermittent in nature given the difficulty in repeating the set of events that results in the failure. Further, it can be difficult to identify bus scheduling or resource conflicts easily because the continuing increase in bus data rates coupled with the adoption of point-to-point buses impede and/or prevent the use of external test equipment to “snoop” bus activity and thereby predict possible conflicts during system bring-up, stress testing, qualification, and in response to field failures.
Currently, verifying correct resource scheduling and/or correlating actual failures is often performed using system simulation to verify that the logic is behaving as expected under a specified configuration. A drawback of this approach is that chip sequences in hardware are very difficult to reproduce exactly. Another manner of identifying conflicts is to add delay cycles between commands to uncover resource scheduling issues manually. Drawbacks of this approach are that the addition of delay cycles is time consuming to debug and does not guarantee that the collision will be found and isolated in a reasonable time frame and may even result in new resource conflicts due to the new command sequences/intervals.
It would be desirable to have a memory subsystem that avoids the above drawbacks while monitoring operations such as memory accesses and reporting resource conflicts with adequate information to permit root cause analysis and determination of corrective actions.