1. Field of the Invention
The present invention relates generally to packet-oriented interfaces for synchronous DRAMs, and more particularly to a method and apparatus for supporting a plurality of orderings for data block transfers within a burst sequence.
2. Background Information
A microprocessor-based system utilizes a memory system having several memory devices for storing information. One such memory device common in standard memory systems is the dynamic random access memory (DRAM). In certain microprocessor-based systems, a high-speed, synchronous bus such as RAMBUS is used to transfer information to the DRAMs. Typically, the synchronous bus is capable of transferring eight data bits simultaneously. In order to facilitate high-speed data exchange between the microprocessor and the DRAMs, a memory controller acts a master for the bus and manages a high-level packet protocol on the synchronous bus.
In order to minimize access to the memory system, a high-speed data cache holds blocks of recently used data. A typical data cache has a plurality of data cache lines, where each data cache line holds a plurality of data blocks for the microprocessor. In the following specification, a data block is the amount of bits which can be transferred to the microprocessor in a single transfer. In some situations, a data block may be equivalent to a word or a long word. In most modern processors, the data block is 64 bits, reflecting the size of the data bus.
When the needed data block is not within the data cache, a cache miss occurs and the memory system is accessed in order to retrieve the corresponding data block. In order to further improve performance, the data cache loads an entire data cache line including the requested data block.
Conventional microprocessor-based systems use microprocessors which operate on 64 bit data blocks from the data cache. Furthermore, some such systems utilize data caches with a 32 byte wide data cache line organized as four 64 bit data blocks. Therefore, when a cache miss occurs in a conventional system, four 64 bit data blocks are transferred in a burst sequence from the memory system to the data cache in order to fill the data cache line. Since the memory bus in synchronous memory-based systems is typically capable of transferring 8 bits of data for each clock cycle, 32 clock cycles are needed to fill the entire data cache line, 8 clock cycles for each block data block.
Conventional DRAMs store information in an array format which is accessed by row and a column. In order to accelerate data transfer for non-sequential accesses, some synchronous DRAMs include a separate line over which a tap address is issued serially while the data block of eight bytes is being received. In this manner the memory addressing can jump to any of 256 locations within a particular row and the next non-sequential address is ready before the next block of data is accessed. The transfer rate of the data blocks of 8 bytes is therefore improved since the subsequent data block can begin transferring immediately upon the finishing of the previous data block.
As faster microprocessors are continually developed, memory systems need to increase data throughput in order to avoid problematic delays. One manner of increasing the data transfer rate of a memory system having DRAMs is to increase the capacity of the memory bus. For example, the capacity of the memory bus could be increased such that 16 bits of data could be transferred simultaneously, thus providing a substantial increase in data throughput. Thus, a data block of 8 bytes could be transferred in 4 clock cycles instead of the 8 clock cycles required in conventional systems. Reducing the number of clock cycles needed to transfer a single data block, however, reduces the size of the tap address which can be transferred. For example, if the bus capacity was doubled to allow 16 bits of data, only four clock cycles would be available to transfer the serial offset for the column address. Thus, only sixteen tap points would be available within an open row. This severely limits efficient transfer of non-sequential data blocks and therefore reduces the data throughput of the memory system. By transmitting the next address serially while a data block is being received, conventional protocols impede an increase in the bus capacity. Therefore, there exists a need in the art for a method and apparatus which facilitates increased bus capacity without restricting the ability to jump within any column within an open row of the DRAM.
Conventional microprocessors, such as microprocessors produced by Intel, require data block transfers in a particular interleaved format. Supporting the particular order required by an Intel processor is traditionally accomplished by transmitting the starting address of the next data block during the transfer of the previous data block. As described earlier, the starting address of the next data block is transmitted serially. Several other conventional microprocessors, however, require data block transfers to be sequential. Synchronous DRAMs exist which support both interleaved and sequential data block ordering; however, these SDRAMs must be reprogrammed each time a different reordering is selected, requiring the memory bus to become inactive and therefore disrupting system activity.
In addition, advanced microprocessor-based systems may contain multiple subsystems which require different data block transfer orderings. For example, a conventional processor may require interleaved data block ordering while another device on the PCI bus may require a linear ordering. This presents a particular problem with conventional DRAMs since conventional DRAMs require reprogramming to switch between block transfer orderings. Reprogramming the DRAM mandates the memory bus be idle over a period of time and therefore introduces delays in the system. There is a need in the art for a method and apparatus which supports various data block transfer orderings, such as interleaved and linear, without requiring reprogramming and which is configurable to support various bus capacities and block transfer sizes.
A system and method of transferring data between a packet-oriented synchronous memory device and a processor are described. The synchronous memory device includes a column address decoder. The column address decoder includes a sequencer which controls one or more bits of column address. According to one aspect of the invention, request packets are used to access data in the synchronous memory device. Each request packet controls a particular burst sequence and each request packet includes one or more data ordering bits which define a data ordering for the particular burst sequence. One of the plurality of request packets is transferred to the synchronous memory device, where information from the request packet is used to load the sequencer as a function of the data ordering bits in the transferred request packet. Data is then transferred to the processor in the data ordering defined for the burst sequence controlled by the transferred request packet.
According to another aspect of the invention, a random access memory module is described which provides a burst sequence of data in response to a request packet. The burst sequence of data includes a plurality of data blocks and the request packet includes one or more data ordering bits which define a data ordering. The random access memory module includes a memory array having a plurality of memory cells organized in an array of rows and columns, a row address decoder connected to the memory array, wherein the row address decoder generates a row address which addresses one of the rows of the memory array, and a column address decoder connected to the memory array, wherein the column address decoder generates a column address which addresses one of the columns of the memory array. The column address decoder includes data ordering control logic for placing data blocks retrieved from the memory array within the burst sequence as a function of the data ordering bits in the request packet such that consecutive burst sequences can be transferred in different data orderings.
According to yet another aspect of the invention, a memory system is described which provides a plurality of data words in response to a read request, wherein the read request includes an address. The memory system includes a controller, a synchronous random access memory, a collector and a synchronous memory bus connected to the controller, the random access memory and the collector. The controller generates a request packet having one or more data ordering bits which define a data ordering in response to the read request and transfers the request packet over the synchronous memory bus to the random access memory. The synchronous random access memory provides a burst sequence of data over the synchronous memory bus to the collector in the data ordering defined in the request packet and the collector receives the burst sequence of data from the random access memory and forms the burst sequence of data into the plurality of data words.
According to yet another aspect of the invention, a computer system is described which replaces cache lines in a cache with data transferred from a synchronous memory module in any of a number of desired orders.
These and other features and advantages of the invention will become apparent from the following description of the preferred embodiments of the invention.