This invention relates to computer systems and more particularly to memory control mechanisms and techniques employed within computer systems. This invention also relates to performance enhancement and optimization of memory control mechanisms for computer systems.
A variety of techniques have been developed to increase the overall processing speed of computer systems. While improvements in integrated circuit processing technologies such as sub-micron processing capabilities have made it possible to dramatically increase the speed of the integrated circuitry itself, other developments in the architectures and bus transfer mechanisms of computer systems have also led to improvements in performance. Exemplary developments include the incorporation of cache memory subsystems as well as code pre-fetching mechanisms within computer systems.
In a typical computer system, memory accesses (reads or writes) are actually composed of discrete operations. An exemplary memory access to a dynamic random access memory (DRAM) (or alternatively synchronous DRAM (SDRAM or SynchDRAM) takes place as follows. The CPU determines that it needs to read or write some data to or from the memory. Note that DRAM based memory is organized by chip select (CS), bank and row. The CS signal is a unique signal that activates a particular group of memory chips in the memory for access. The bank and row refers to the physical design/organization of the chips themselves. Any access must be made by selecting a particular CS, bank and row (this combination is also known as a page). Further, DRAM type memory chips provide a row buffer (one per bank) which holds the data currently being accessed. Continuing with the example, the CPU will dispatch a request along with an address to the memory control logic to retrieve the desired data. The memory control logic will convert the address into a physical memory location consisting of a CS, bank, and row and then initiate the memory access as described below.
In order to access a particular row in the memory, if this row is not already active (see below), the bank containing that row must be pre-charged. Effectively, pre-charging raises all of the bit lines (the wires that connect the rows in each bank to the row buffer) to a voltage that represents a logical 1. When the page is activated (or connected to the bit lines), any bits in the page containing logical zeroes cause the respective bit lines to drop to logical zero. This saves time versus initializing the bit lines to logical zero and waiting for the bits in the page representing a logical 1 to charge up the respective bit lines. A pre-charge operation also causes any currently active row, from a previous access to the bank, to be written back to the memory array from the row buffer so that the data is not lost (see below). A CS or bank can be pre-charged in several ways. Pre-charging occurs upon initialization of the memory, whenever there is a refresh to that CS or whenever the memory control logic dispatches a pre-charge operation to that CS or bank. If the bank is not currently pre-charged, the memory control logic will issue a pre-charge operation to the desired CS in order to pre-charge the bit lines of the desired bank (or possibly all the banks) on that CS.
Next, an activate operation is sent to the desired CS and bank along with the row address in order to activate the particular page onto the bit lines and transfer the page of data into the bank""s row buffer. Note that, due to the nature of DRAM memory, an activate operation destroys the contents of that row in the memory array in the process of moving those contents to the row buffer. In order to replace the contents back in the memory array and ensure they are not lost, a pre-charge operation (as discussed earlier) is necessary before activating another row into the row buffer. Once the page is in the row buffer, the appropriate read or write operation can be dispatched along with the column address identifying the bits to read or write. These operations initiate the memory request. The memory request is then completed by transferring the data to or from the memory. Note that once a row is activated and in the row buffer, the memory control logic can perform many reads and writes to that row without performing an additional pre-charge or activate operation.
As can be seen from the example, the initiation of an access to the memory can be broken down into the primitive operations of pre-charge, activate and read/write. Once initiated, the data transfer must then be completed to or from the memory. That is, for a read, the data must be taken in from the memory and passed back to the requester and for a write, the data to be written must be sent to the memory.
It is well known in the art that state machine logic can be constructed to efficiently decode accesses, dispatch primitive operations, and control the completion of data transfers to optimize the use of the memory. However, the state machine logic needed to perform these operations, track dependencies among operations and dispatch and complete operations in parallel is often complex. This results in a complex design that requires more gates to implement and is harder to understand and verify.
Further, a complex design usually operates slower. Computer logic is typically designed around a clock signal which keeps operations within the computer synchronized. A typical design has logic stages, each stage of which includes input latches, output latches and combinational logic. The input latches are connected to the inputs of the combinational logic. The input latches latch and hold the input signals steady while the combinational logic operates on them. The output latches latch the output of the combinational logic. The input latches and output latches are also connected to the clock signal. The combinational logic consists of logic gates such as NAND or NOR gates arranged and connected to perform a logic function.
On each pulse of the clock signal, the input latches latch the input signals and make them available to the combinational logic and the output latches latch the output of the combinational logic. The logic stage takes advantage of the fact that the circuits that make up the gates of the combinational logic have propagation delays which introduce a delay between the time the input signals are latched and the time that the result of the combinational logic function is computed. The logic stage is designed so that the combinational logic finishes its computation (that all the signals have propagated through) before the next clock pulse hits the output latches. In this way, on each clock pulse, the inputs to the combinational logic change, and the output latches latch the result of the previous inputs. Since the output latches also form the input latches for the next logic stage, data is thereby moved from one stage of logic to the next.
Notice that the number of gates that can be put in a logic stage between the input and output latches is partly a function of the clock frequency of the computer. A faster clock frequency leaves less time for signals to propagate through the gates. A more complex design may require more gates between the input and output latches necessitating a slower clock. Therefore, the designer must often make a trade off between a fast clock and a complex logic design.
Accordingly there is a need to optimize and enhance the performance of accesses to the memory while simplifying the design of the memory control logic. Further, there is a need to reduce the logical complexity of the memory control logic which will in turn result in a reduction of the gate counts, the design time/cost and the number of design errors. This will further allow for a decrease in the number of gate delays between logic stages which will result in overall faster operation.
The problems outlined above are solved by an apparatus and method to send memory requests to a computer memory according to the present invention. In one aspect of the invention, a memory controller is provided which includes a request decoder that receives a memory request and decodes it into primitive memory operations and operation queues coupled to the request decoder and operative to store the primitive memory operations. The memory controller further includes a multiplexor coupled to the queues and the computer memory which is operative to select one primitive memory operation from the queues and transmit it to the computer memory in order to initiate the memory request. The queues are further operative to clear the selected primitive memory operation once transmitted by the multiplexor. The memory controller also includes control queues which are coupled to the operation queues and the computer memory and complete the memory requests in the computer memory once initiated.
The present invention further contemplates a method for executing memory requests to a computer memory using a memory controller, comprising the steps of: accepting a memory request from a memory request generator; decoding the memory request into one or more primitive memory operations; queuing the primitive memory operations into one or more operation queues; selecting one of the queued primitive memory operations for transmission to the memory; transmitting the queued primitive memory operation to the memory to initiate the memory request; dequeuing the queued primitive memory operation when the primitive memory operation has been transmitted to the memory; queuing control data into one or more control queues which then complete the memory request in the computer memory; and dequeuing the control data as the memory request completes.
As a result of the present invention, memory accesses are optimized and the performance of the main memory is enhanced. These advantages are achieved while simplifying the design of the memory access control logic and reducing its logical complexity. This, in turn, results in a reduction of the gate counts, the design time/cost and the number of design errors. In addition, the decrease in the number of gate delays between logic stages results in overall faster operation. The present invention also provides an easily adaptable structure that can be used with a variety of memory types.