System controller chips, sometimes known as “North Bridge” chips, are used to interface a memory with a central processing unit (CPU) and other components, such as a graphics processing unit (GPU). The North Bridge chipset architecture is a well-known, architecture to interface a CPU, memory, and other components using a dedicated North Bridge chip and corresponding South Bridge chip. Recently, however, the functionality of North Bridge chips has been expanded. For example, the function of a North Bridge chip can be included within chips providing other functions. Some references use the term “system controller” to denote a more generic application of the function conventionally provided by a North Bridge chip. Consequently, as used in this application, a system controller is a controller that provides the function of a North Bridge chip in regards to interfacing a CPU and a memory.
FIG. 1 illustrates a prior art system having a CPU 100, North Bridge chip 105 including a memory controller interface 110, and dynamic random access memory (DRAM) memory 115. DRAM memory 115 may, for example, be comprised of dual inline memory modules (DIMMS) including static dynamic random access memory (SDRAM). The JEDEC Solid State Technology Association has several standards describing protocols and standard signal definitions for the operation of SDRAM memory, such as JEDEC Standard “Double Data Rate (DDR) SDRAM Specification”, published January 2004 and JEDEC Standard “DDR2 SDRAM Specification,” published January 2004, the contents of each of which are hereby incorporated by reference in their entirety. DDR DRAM typically includes a number of pins with specified functions. For example, the DQ pins are bidirectional input/output data bus pins that are used for input and output of data. During a read operation, data read from the selected memory cell appears at the DQ pins when the access is complete and the output is enabled. The DQS pin is a data strobe pin that is output with read data and input with write data. The DQS is commonly edge aligned with read data and center aligned with write data. A chip select (CS) pin masks all commands when the complement of CS has a logical high value. DRAM command inputs typically include a row address strobe (RAS) input, a write enable (WE) input, and a column address strobe (CAS). CAS is an active low signal. The CAS# signal is the inverted version of the CAS signal. The CAS# input is used to latch the column address and is one of the command signals used to initiate the read or write operation. There is a CAS latency (CL) associated with the latency between the receipt of a read command at the DRAM and the output of data. Thus, for example, after a DRAM receives command signals for a read operation, there will be a read delay (RL), corresponding to the CL and any additive latency (AL) before valid data appears on the DQ pins.
Other components, such as a GPU 120 and South Bridge chip 125 may also be coupled to North Bridge chip 105. CPU 100 is coupled to North Bridge chip 105 via a front side bus (FSB) that includes respective bus interface units (BIU) in CPU 100 and North Bridge chip 105. North Bridge chip 105 is coupled to DRAM memory 115 via a memory bus 130.
The FSB typically operates according to a FSB protocol. An exemplary FSB protocol is described in the book by Tom Shanley, Unabridged Pentium 4: IA32 Processor Genealogy, MindShare, Inc. (2004), the contents of which are hereby incorporated by reference. FSB protocols typically include a sequence of transaction phases that proceed in a predefined order, such as an arbitration phase, request phase, error phase, snoop phase, response phase, and data phase. For example, for a read request issued from CPU 100, a request agent in the BIU of CPU 100 issues a read request to a response agent in the BIU of North Bridge 105. FSB protocols include a response that indicates that the response agent will provide the data. The data phase of a transaction cannot be completed until the request and response agents in the FSB are ready to transfer data.
Arrow 135 illustrates a data read request path. The data read request path corresponds to a read request issued from CPU 100 that is passed through memory controller interface 110 in North Bridge chip 105 to DRAM memory 115. Arrow 140 illustrates a data return path from memory 115 back to CPU 100 through North Bridge chip 105. Conventionally, the time delay along the data return path includes several factors, which are illustrated in simplified form in FIG. 2, which is not to scale. First, there are analog delays associated with the latency of the memory bus for transmitting a request from North Bridge chip 105 to memory 115 and for returning data from DRAM memory 115 to North Bridge chip 105. Second, there is a CAS latency within DRAM memory 115 to respond to a data request. Additionally, there is a synchronization delay associated with the handover between the memory clock domain and the CPU clock domain. The DRAM data must be synchronized to a clock edge of the FSB in order for the FSB response agent to receive the data for transfer. As an example, the FSB clock may have a first clock rate associated with the CPU clock domain, such as 266 MHz, whereas the DRAM memory clock operates at a different clock rate, such as 333 MHz. Conventionally, a synchronization step is required to perform a handover between the clock domains. For example, a sync handoff module 150 may be used to identify a crossover between an edge of the memory clock (operating at a memory clock rate) and an edge of the front side bus clock (operating at a different clock rate associated with the CPU clock domain) to perform the handover between memory clock domains. Additionally, once the data becomes available in the CPU clock domain, there may be a number of cycles required to prepare to send the data. For example, a data scheduler (not shown) may need to verify that the data has arrived and prepare to send the data out according to a bus protocol.
As a result of all the combined latencies, the read return latency is greater than desired for many applications. The read return latency, for example, introduces CPU clock cycles in which the CPU is waiting for data to return before it can complete an operation.
Therefore, what is desired is an improved system, apparatus, and method for a fast data return memory controller.