This invention relates generally to memory devices used in computer systems and, more particularly, to a method and apparatus for the pipelined processing of memory commands.
Conventional computer systems include a processor (not shown) coupled to a variety of memory devices, including read-only memories (xe2x80x9cROMsxe2x80x9d) which traditionally store instructions for the processor, and a system memory to which the processor may write data and from which the processor may read data. The processor may also communicate with an external cache memory, which is generally a static random access memory (xe2x80x9cSRAMxe2x80x9d). The processor also communicates with input devices, output devices, and data storage devices.
Processors generally operate at a relatively high speed. Processors such as the Pentium(copyright) and Pentium II(copyright) microprocessors are currently available that operate at clock speeds of at least 400 MHz. However, the remaining components of existing computer systems, with the exception of SRAM cache, are not capable of operating at the speed of the processor. For this reason, the system memory devices, as well as the input devices, output devices, and data storage devices, are not coupled directly to the processor bus. Instead, the system memory devices are generally coupled to the processor bus through a memory controller, bus bridge or similar device, and the input devices, output devices, and data storage devices are coupled to the processor bus through a bus ridge. The memory controller allows the system memory devices to operate at a clock frequency that is substantially lower than the clock frequency of the processor. Similarly, the bus bridge allows the input devices, output devices, and data storage devices to operate at a substantially lower frequency. Currently, for example, a processor having a 400 MHz clock frequency may be mounted on a mother board having a 66 MHz clock frequency for controlling the system memory devices and other components.
Access to system memory is a frequent operation for the processor. The time required for the processor, operating, for example, at 400 MHz, to read data from or write data to a system memory device operating at, for example, 66 MHz, greatly slows the rate at which the processor is able to accomplish its operations. Thus, much effort has been devoted to increasing the operating speed of system memory devices.
System memory devices are generally dynamic random access memories (xe2x80x9cDRAMsxe2x80x9d). Initially, DRAMs were asynchronous and thus did not operate at even the clock speed of the motherboard. In fact, access to asynchronous DRAMs often required that wait states be generated to halt the processor until the DRAM had completed a memory transfer. However, the operating speed of asynchronous DRAMs was successfully increased through such innovations as burst and page mode DRAMs, which did not require that an address be provided to the DRAM for each memory access. More recently, synchronous dynamic random access memories (xe2x80x9cSDRAMsxe2x80x9d) have been developed to allow the pipelined transfer of data at the clock speed of the motherboard. However, even SDRAMs are incapable of operating at the clock speed of currently available processors. Thus, SDRAMs cannot be connected directly to the processor bus, but instead must interface with the processor bus through a memory controller, bus bridge, or similar device. The disparity between the operating speed of the processor and the operating speed of SDRAMs continues to limit the speed at which processors may complete operations requiring access to system memory.
A solution to this operating speed disparity has been proposed in the form of a computer architecture known as xe2x80x9cSLDRAM.xe2x80x9d In the SLDRAM architecture, the system memory may be coupled to the processor either directly through the processor bus or through a memory controller. Rather than requiring that separate address and control signals be provided to the system memory, SLDRAM memory devices receive command packets that include both control and address information. The SLDRAM memory device then outputs or receives data on a data bus that may be coupled directly to the data bus portion of the processor bus.
An example of a computer system 10 using the SLDRAM architecture is shown in FIG. 1. The computer system 10 includes a processor 12 having a processor bus 14 coupled to three packetized dynamic random access memory or SLDRAM devices 16a-c. The computer system 10 also includes one or more input devices 20, such as a keypad or a mouse, coupled to the processor 12 through a bus bridge 22 and an expansion bus 24, such as an industry standard architecture (xe2x80x9cISAxe2x80x9d) bus or a Peripheral component interconnect (xe2x80x9cPCIxe2x80x9d) bus. The input devices 20 allow an operator or an electronic device to input data to the computer system 10. One or more output devices 30 are coupled to the processor 12 to display or otherwise output data generated by the processor 12. The output devices 30 are coupled to the processor 12 through the expansion bus 24, bus bridge 22 and processor bus 14. Examples of output devices 24 include printers and a video display units. One or more data storage devices 38 are coupled to the processor 12 through the processor bus 14, bus bridge 22, and expansion bus 24 to store data in or retrieve data from storage media (not shown). Examples of storage devices 38 and storage media include fixed disk drives floppy disk drives, tape cassettes and compact-disk read-only memory drives.
In operation, the processor 12 communicates with the memory devices 16a-c via the processor bus 14 by sending the memory devices 16a-c command packets that contain both control and address information. Data is coupled between the processor 12 and the memory devices 16a-c, through a data bus portion of the processor bus 14. Although all the memory devices 16a-c are coupled to the same conductors of the processor bus 14, only one memory device 16a-c at a time reads or writes data, thus avoiding bus contention on the processor bus 14. Bus contention is avoided by each of the memory devices 16a-c on the bus bridge 22 having a unique identifier, and the command packet contains an identifying code that selects only one of these components.
A typical command packet for a SLDRAM is shown in FIG. 2. The command packet is formed by 4 packet words each of which contains 10 bits of data. The first packet word WI contains 7 bits of data identifying the packetized DRAM 16a-c that is the intended recipient of the command packet. As explained below, each of the packetized DRAMs is provided with a unique ID code that is compared to the 7 ID bits in the first packet word W1. Thus, although all of the packetized DRAMs 16a-c will receive the command packet, only the packetized DRAM 16a-c having an ID code that matches the 7 ID bits of the first packet word W1 will respond to the command packet.
The remaining 3 bits of the first packet word W1 as well as 3 bits of the second packet word W2 comprise a 6 bit command. Typical commands are read and write in a variety of modes, such as accesses to pages or banks of memory cells. The remaining 7 bits of the second packet word W2 and portions of the third and fourth packet words W3 and W4 comprise a 20 bit address specifying a bank, row and column address for a memory transfer or the start of a multiple bit memory transfer. In one embodiment, the 20-bit address is divided into 3 bits of bank address, 10 bits of row address, and 7 bits of column address.
Although the command packet shown in FIG. 2 is composed of 4 packet words each containing up to 10 bits, it will be understood that a command packet may contain a lesser or greater number of packet words, and each packet word may contain a lesser or greater number of bits.
The computer system 10 also includes a number of other components and signal lines that have been omitted from FIG. 1 in the interests of brevity. For example, as explained below, the memory devices 16a-c also receive a master clock signal to provide internal timing signals, a data clock signal clocking data into and out of the memory device 16, and a FLAG signal signifying the start of a command packet.
One of the memory devices 16a is shown in block diagram form in FIG. 3. The memory device 16a includes a clock divider and delay circuit 40 that receives a master clock signal 42 and generates a large number of other clock and timing signals to control the timing of various operations in the memory device 16. The memory device 16 also includes a command buffer 46 and an address capture circuit 48 which receive an internal clock CLK signal, a command packet CA0-CA9 on a command bus 50, and a FLAG signal on line 52. As explained above, the command packet contains control and address information for each memory transfer, and the FLAG signal identifies the start of a command packet. The command buffer 46 receives the command packet from the bus 50, and compares at least a portion of the command packet to identifying data from an ID register 56 to determine if the command packet is directed to the memory device 16a or some other memory device 16b, c. If the command buffer 46 determines that the command is directed to the memory device 16a, it then provides the command to a command decoder and sequencer 60. The command decoder and sequencer 60 generates a large number of internal control signals to control the operation of the memory device 16a during a memory transfer corresponding to the command.
The address capture circuit 48 also receives the command packet from the command bus 50 and outputs a 20-bit address corresponding to the address information in the command. The address is provided to an address sequencer 64 which generates a corresponding 3-bit bank address on bus 66, an 11-bit row address on bus 68, and a 6-bit column address on bus 70.
One of the problems of conventional DRAMs is their relatively low speed resulting from the time required to precharge and equilibrate circuitry in the DRAM array. The packetized DRAM 16a shown in FIG. 3 largely avoids this problem by using a plurality of memory banks 80, in this case eight memory banks 80a-h. After a read from one bank 80a, the bank 80a can be precharged while the remaining banks 80b-h are being accessed. Each of the memory banks 80a-h receives a row address from a respective row latch/decoder/driver 82a-h. All of the row latch/decoder/drivers 82a-h receive the same row address from a predecoder 84 which, in turn, receives a row address from either a row address register 86 or a refresh counter 88 as determined by a multiplexer 90. However, only one of the row latch/decoder/drivers 82a-h is active at any one time as determined by bank control logic 94 as a function of bank data from a bank address register 96.
The column address on bus 70 is applied to a column latch/decoder 100 which, in turn, supplies I/O gating signals to an I/O gating circuit 102. The I/O gating circuit 102 interfaces with columns of the memory banks 80a-h through sense amplifiers 104. Data is coupled to or from the memory banks 80a-h through the sense amps 104 and I/O gating circuit 102 to a data path subsystem 108 which includes a read data path 110 and a write data path 112. The read data path 110 includes a read latch 120 receiving and storing data from the I/O gating circuit 102. In the memory device 16a shown in FIG. 2, 64 bits of data are applied to and stored in the read latch 120. The read latch then provides four 16-bit data words to a multiplexer 122. The multiplexer 122 sequentially applies each of the 16-bit data words to a read FIFO buffer 124. Successive 16-bit data words are clocked through the FIFO buffer 124 by a clock signal generated from an internal clock by a programmable delay circuit 126. The FIFO buffer 124 sequentially applies the 16-bit words and two clock signals (a clock signal and a quadrature clock signal) to a driver circuit 128 which, in turn, applies the 16-bit data words to a data bus 130 forming part of the processor bus 14. The driver circuit 128 also applies the clock signals to a clock bus 132 so that a device such as the processor 12 reading the data on the data bus 130 can be synchronized with the data.
The write data path 112 includes a receiver buffer 140 coupled to the data bus 130. The receiver buffer 140 sequentially applies 16-bit words from the data bus 130 to four input registers 142, each of which is selectively enabled by a signal from a clock generator circuit 144. Thus, the input registers 142 sequentially store four 16-bit data words and combine them into one 64-bit data word applied to a write FIFO buffer 148. The write FIFO buffer 148 is clocked by a signal from the clock generator 144 and an internal write clock WCLK to sequentially apply 64-bit write data to a write latch and driver 150. The write latch and driver 150 applies the 64-bit write data to one of the memory banks 80a-h through the I/O gating circuit 102 and the sense amplifier 104.
As mentioned above, an important goal of the SLDRAM architecture is to allow data transfer between a processor and a memory device to occur at a significantly faster rate. However, the operating rate of a packetized DRAM, including the packetized DRAM shown in FIG. 3, is limited by the time required to receive and process command packets applied to the memory device 16a. More specifically, not only must the command packets be received and stored, but they must also be decoded and used to generate a wide variety of signals. However, in order for the memory device 16a to operate at a very high speed, the command packets must be applied to the memory device 16a at a correspondingly high speed. As the operating speed of the memory device 16a increases, the command packets are provided to the memory device 16a at a rate that can exceed the rate at which the command buffer 46 can process the command packets.
One solution that has been developed to increase the operating speed of the command buffer 46 is to use a queue and multiple command units, also known as a pipeline or pipelining. Pipelining is xe2x80x9c[a] method of fetching and decoding instructions (preprocessing) in which, at any given time, several program instructions are in various stages of being fetched or decoded . . . xe2x80x9d Computer Dictionary, Microsoft Press, copyright 1991. A command unit is a portion of the command buffer 46 that initially processes the command packets received from the microprocessor. Each of the command units retrieves a single command from a command packet, processes the command, and transmits the processed command to another portion of the command buffer 46 for further processing and execution. After the command unit processes a first command, another command unit processes a second command, etc. By using multiple command units, multiple commands can be processed simultaneously.
This method of operation allows the memory device 16 to continue to receive command packets even though the prior command packet has not yet been processed. In fact, the command packets can be received as long as the average rate at which the command packets are received is less than the average time to process the command packets and complete memory transfer operations. As a result, memory devices using the packetized command buffer 46, described above, are able to operate at a relatively high speed. A memory device 16 is described in greater detail in U.S. patent application Ser. No. 08/994,461, xe2x80x9cMethod and System For Processing Pipelined Memory Commands,xe2x80x9d herein incorporated by reference.
The command buffer 46 includes a column command unit 228, shown in FIG. 4, having a plurality of command units 500a-h, shown in FIG. 6. Each of the command units 500 stores a plurality of command bits of a command packet, and with the use of a counter 550, described below, subsequently outputs the stored command bits and a command start signal to a command processor 508. In response to receiving the command start signal, the command processor 508 processes the command bits to generate at least one command signal.
Each of the command units 500 uses the counter 550 and a start command generator 560, shown in FIG. 7, to generate the command start signal. The counter is loaded with an initial count corresponding to bits in the command packet. The counter 550 receives a timing signal and then counts from the initial count to a terminal count responsive to a clock signal. The start command generator 560 produces the command start signal at one of several of the counts that are each a function of the command indicated by the stored command bits, and/or the frequency of the clock signal. As mentioned above, in response to receiving the command start signal, the command processor 508, shown in FIG. 6, processes the command bits received from the command unit 500 to generate at least one command signal.
Once the counter reaches the terminal count the counter stops counting and generates a signal indicating that the command unit 500 is available to receive a new command from the next command packet in the queue/pipeline. Only after the counter 550 reaches the terminal count is the command unit 500 made available to receive the new command even though it had completed its task of generating the command start signal when the counter reached the appropriate count. Thus, during the period between the counter generating the command start signal and the counter reaching the terminal count it is needlessly idle since it has completed its function yet is unable to accept a new command packet. This wait state slows down the frequency at which the command unit 500 may receive and process the commands.
For example, the command start signal for a read command with a 400 MHz clock signal may be generated on count 12. The counter may have been loaded with an initial count of 63. Thus, when the counter is activated, it begins counting down to the terminal count, typically zero, from 63. When the counter reaches count 12, the command start signal is generated, initiating the processing of the command bits by the command processor 508 to perform a memory transfer operation. However, the counter 550 continues to decrement until it reaches zero. The counter 550 then sends a signal to the start command generator 560 indicating that the command has been processed and transmitted to the command processor 508. Upon receiving the signal, the command unit 500 becomes available to receive a new command. The command unit 500 thus remains idle while the counter decrements from 11 to zero.
One solution to this problem might be to change the terminal count. Instead of the terminal count being zero, it could be 10, for example. However, as mentioned above, the count at which the command unit 500 generates the command start signal will vary with the nature of the command, e.g., read, write, initiate equilibration, and with the frequency of the clock signal. Thus the terminal count could only be changed to the lowest count used to generate the command start signal. For example, for some SLDRAMs this would be a count of 4 since a count of 5 generates a command start signal for a write operation at a clock frequency of 400 MHz. In contrast, the command unit 500 generates the start signal for a write operation at a clock frequency of 400 MHz at a count of 12. Thus, raising the terminal count to 4 would still result in a wait of 8 counts for a write operation at a clock frequency of 400 MHz.
Similarly, the terminal count for a DRAM may vary depending on the clock frequency of the memory device. For example, as seen above, the terminal count for a write operation using a clock frequency of 400 MHz could be a count of 4. The same DRAM using a clock frequency of 800 MHz may execute the write command on a count of 10, and thus could have a terminal count of 9. However, the DRAMs could not be used efficiently, if at all, with a different clock frequency.
The present invention provides a method and apparatus for processing memory commands applied to a memory device. A command unit is enabled to store a command packet upon receipt of an acknowledgment signal. When enabled, the command unit receives and processes the command packet and generates at least one command signal corresponding to the command packet. A command processor receives at least one of the command signals from the enabled command unit, and processes the command signal or signals to generate at least one control signal responsive thereto. The command processor generates the acknowledgment signal as soon as the control signal is generated, and transmits the acknowledgment signal to the command unit. The command unit can then receive the next command packet as soon as the control signal is generated.