This invention relates to computer systems and more particularly to memory control mechanisms and techniques employed within computer systems. This invention also relates to performance enhancement and optimization of memory control mechanisms for computer systems.
A variety of techniques have been developed to increase the overall processing speed of computer systems. While improvements in integrated circuit processing technologies such as sub-micron processing capabilities have made it possible to dramatically increase the speed of the integrated circuitry itself, other developments in the architectures and bus transfer mechanisms of computer systems have also led to improvements in performance. Exemplary developments include the incorporation of cache memory subsystems as well as code pre-fetching mechanisms within computer systems.
In a typical computer system, memory accesses (reads or writes) are actually composed of discrete operations. An exemplary memory access to a dynamic random access memory (DRAM) (or alternatively synchronous DRAM (SDRAM or SynchDRAM) takes place as follows. The CPU determines that it needs to read or write some data to or from the memory. Note that DRAM based memory is organized by chip select (CS), bank and row. The CS signal is a unique signal that activates a particular group of memory chips in the memory for access. The bank and row refers to the physical design/organization of the chips themselves. Any access must be made by selecting a particular CS, bank and row (this combination is also known as a page). Further, DRAM type memory chips provide a row buffer (one per bank) which holds the data currently being accessed. Continuing with the example, the CPU will dispatch a request along with an address to the memory control logic to retrieve the desired data. The memory control logic will convert the address into a physical memory location consisting of a CS, bank, and row and then initiate the memory access as described below.
In order to access a particular row in the memory, if this row is not already active (see below), the bank containing that row must be pre-charged. Effectively, pre-charging raises all of the bit lines (the wires that connect the rows in each bank to the row buffer) to a voltage that represents a logical 1. When the page is activated (or connected to the bit lines), any bits in the page containing logical zeroes cause the respective bit lines to drop to logical zero. This saves time versus initializing the bit lines to logical zero and waiting for the bits in the page representing a logical 1 to charge up the respective bit lines. A pre-charge operation also causes any currently active row, from a previous access to the bank, to be written back to the memory array from the row buffer so that the data is not lost (see below). A CS or bank can be pre-charged in several ways. Pre-charging occurs upon initialization of the memory, whenever there is a refresh to that CS or whenever the memory control logic dispatches a pre-charge operation to that CS or bank. If the bank is not currently pre-charged, the memory control logic will issue a pre-charge operation to the desired CS in order to pre-charge the bit lines of the desired bank (or possibly all the banks) on that CS.
Next, an activate operation is sent to the desired CS and bank along with the row address in order to activate the particular page onto the bit lines and transfer the page of data into the bank""s row buffer. Note that, due to the nature of DRAM memory, an activate operation destroys the contents of that row in the memory array in the process of moving those contents to the row buffer. In order to replace the contents back in the memory array and ensure they are not lost, a pre-charge operation (as discussed earlier) is necessary before activating another row into the row buffer. Once the page is in the row buffer, the appropriate read or write operation can be dispatched along with the column address identifying the bits to read or write. These operations initiate the memory request. The memory request is then completed by transferring the data to or from the memory and sending the appropriate feedback to the unit within the computer system that generated the memory request. Note that once a row is activated and in the row buffer, the memory control logic can perform many reads and writes to that row without performing an additional pre-charge or activate operation.
As can be seen from the example, the initiation of an access to the memory can be broken down into the primitive operations of pre-charge, activate and read/write. Once initiated, the data transfer must then be completed to or from the memory. That is, for a read, the data must be taken in from the memory and passed back to the requestor and for a write, the data to be written must be sent to the memory. Further, the unit that generated the memory request must be informed of its completion or provided with the data it requested.
It is well known in the art that state machine logic can be constructed to efficiently decode accesses, dispatch primitive operations, and control the completion of data transfers to optimize the use of the memory. However, the state machine logic needed to perform these operations, track dependencies among operations and dispatch and complete operations in parallel is often complex. This results in a complex design that requires more gates to implement and is harder to understand and verify.
Further, a complex design usually operates slower. Computer logic is typically designed around a clock signal which keeps operations within the computer synchronized. A typical design has logic stages, each stage of which includes input latches, output latches and combinational logic. The input latches are connected to the inputs of the combinational logic. The input latches latch and hold the input signals steady while the combinational logic operates on them. The output latches latch the output of the combinational logic. The input latches and output latches are also connected to the clock signal. The combinational logic consists of logic gates such as NAND or NOR gates arranged and connected to perform a logic function.
On each pulse of the clock signal (each xe2x80x9cclock cyclexe2x80x9d), the input latches latch the input signals and make them available to the combinational logic and the output latches latch the output of the combinational logic. The logic stage takes advantage of the fact that the circuits that make up the gates of the combinational logic have propagation delays which introduce a delay between the time the input signals are latched and the time that the result of the combinational logic function is computed. The logic stage is designed so that the combinational logic finishes its computation (that all the signals have propagated through) before the next clock pulse hits the output latches. In this way, on each clock pulse/cycle, the inputs to the combinational logic change, and the output latches latch the result of the previous inputs. Since the output latches also form the input latches for the next logic stage, data is thereby moved from one stage of logic to the next.
Notice that the number of gates that can be put in a logic stage between the input and output latches is partly a function of the clock frequency of the computer. A faster clock frequency leaves less time for signals to propagate through the gates. A more complex design may require more gates between the input and output latches necessitating a slower clock. Therefore, the designer must often make a trade off between a fast clock and a complex logic design.
Accordingly there is a need to optimize and enhance the performance of accesses to the memory while simplifying the design of the memory control logic. Further, there is a need to reduce the logical complexity of the memory control logic which will in turn result in a reduction of the gate counts, the design time/cost and the number of design errors. This will further allow for a decrease in the number of gate delays between logic stages which will result in overall faster operation.
The problems outlined above are solved by an apparatus and method to complete memory requests in a computer memory according to the present invention. In one aspect of the invention, a memory controller is provided which includes a read write control queue for completing memory requests from the memory controller to a computer memory where the read write control queue includes a queue controller coupled to the memory controller and operative to detect when the memory controller initiates a memory request to the memory. The queue controller is also operative to generate memory control data for completing the memory request. The read write control queue also includes at least one queue comprising at least one top portion coupled to the queue controller and operative to receive the generated memory control data for the initiated request. Further, the queues also have a bottom portion coupled to the computer memory and operative to provide the memory control data to control the transfer of data between said memory controller and the computer memory. Between the top and bottom portions is a shift mechanism which is operative to shift the memory control data from the top portion to the bottom portion.
The present invention further contemplates a method for completing at least one data transfer between a memory controller and a computer memory using at least one queue comprising a top portion and a bottom portion. The data transfer comprises at least one transmission of at least one unit of data. This method comprises the steps of: initiating the data transfer in the computer memory; loading the top portion of the queues with control data to control each of transmission; controlling each transmission using the control data from said bottom portion; and shifting the control data from the top portion to the bottom portion after each transmission.
As a result of the present invention, memory accesses are optimized and the performance of the main memory is enhanced. These advantages are achieved while simplifying the design of the memory access control logic and reducing its logical complexity. This, in turn, results in a reduction of the gate counts, the design time/cost and the number of design errors. In addition, the decrease in the number of gate delays between logic stages results in overall faster operation. The present invention also provides an easily adaptable structure that can be used with a variety of memory types.