The discussion relating to the present invention will be better understood with reference to the terms that are defined generally as indicated.
Arbiter—digital circuit whose function includes arbitrating amongst multiple simultaneous read and/or write access requests from multiple processors for access to a shared memory system.
Atomic—an operation in which a block of data is fully read from or written to memory without any other intervening process reading from or writing to the same block of data.
Data block—data of a particular size that is written to or is read from a memory system. The size of a block is defined by two parameters, width and height. The width denotes the number of consecutive banks in which the block is stored. The height denotes the number of consecutive address locations or rows on a single memory module at which the block is stored.
Bank—partition of data within a DRAM module; an address and a bank number must be applied when reading or writing a DRAM module. Typical DRAM modules are comprised of multiple (four or eight) banks.
FIFO buffer—first in and first out buffer queue.
Memory system—a collection of discrete memory modules, such as DRAM and SRAM, each module having a plurality of addresses or locations in which blocks of data can be accessed. The modules may be attached to or integrated into a network processing chip or other type of processor chip. Multiple DRAM modules can be used in parallel to construct a memory system containing many banks.
Window—a number of clock cycles during which the banks of a DRAM module may be accessed. Since only one bank of a DRAM module can be accessed at a given time (the address and bank number must be applied), the banks of a module are accessed consecutively over time in order to maximize the bandwidth. For a four-bank DRAM module, for instance, the banks are accessed in the order of A B C D A B C D. One set of accesses (A B C D) is a “window”, and takes a fixed number of clock cycles, depending on the type and speed of the DRAM. A window is typically between 10 and 12 clock cycles in duration.
A network processor incorporates multiple general purpose processors and specialized logic. The memory systems that are generally used with network processors are comprised of memory modules which often include SRAM and DRAM. Each memory module has a plurality of memory addresses and, in the case of DRAM modules, data that is for a given address is partitioned into multiple banks. When a DRAM module is accessed, a memory address and a bank number must be supplied. In many cases, a block of data being accessed by one of the general purpose processors cannot fit into a single memory location or bank within a memory module, thereby necessitating the allocation of the data into different memory locations within the same module or in another module. Some of these locations can be in DRAM memory whereas others may, for example, be located in SRAM memory.
As noted above in the definitions, since only one bank can be accessed at a time in a given DRAM, bandwidth to and from the DRAMs is improved by accessing the banks consecutively in a Time-Division Multiplexed (TDM) fashion. In the case of four-bank DRAM modules, banks A, B, C and D can be accessed during one TDM window. It should be noted that during the TDM window, the address associated with each bank is independent. So, in a given window, one could read address 0 of bank A and address 5 of bank B. In addition, each window is designated as a “preferred read” or a “preferred write” window, meaning that bank accesses within the window are all read accesses or all write accesses (not mixed). If the banks were accessed randomly, rather than in a TDM fashion, the bandwidth would be reduced due to the insertion of additional cycles between accesses to meet the timing requirements of the DRAM module(s). Similarly, if read and write operations were mixed within a window, the DRAM timing requirements necessitate insertion of additional cycles between read and write accesses, thereby reducing bandwidth. It should be noted that it is not required that every bank be accessed during every window. For example, if banks A, C and D need to be read during a certain window, they will be accessed, but bank B will remain idle.
In a network processing environment, multiple processors may be independently accessing the memory module(s) attached to a network processing chip. The memory system contains routing information which the general purpose processor uses to determine how to route information. Periodically, one of the general purpose processors updates this routing information by writing a block of data to the memory system. Since these processors operate independently, it is imperative that the digital circuits of the arbiter (see above definition) preclude one processor from writing to a given block of memory while another processor is attempting to read all or part of the same memory block. Otherwise, the processor attempting read access could get partially updated routing information.
The problem is exacerbated by the fact that the data block may be spread across multiple banks of a DRAM module, and across multiple addresses (rows) in the module. The same challenges are present in any system in which multiple processors attempt to independently access data blocks which are stored in a memory system comprised of multiple modules, DRAM and/or SRAM. Thus, an “atomic” operation means that a block of data is completely read from or written to a memory system without any other process writing or reading any portion of the same data block at the same time. Two common methods of achieving atomic operations are as follows. In some multi-processor systems, semaphores are used to lock very large areas of memory, referred to as “pages”. Pages may contain hundreds or even thousands of data blocks. Therefore, when a page is locked by one processor, access by any other processor to any data block within the page is prohibited, even if logic blocks being requested by the two processors are different. This method has a definite latency penalty.
A second method of achieving atomic operations is to allow all of the read or write accesses required to service a particular processor's request for a data block without allowing any other read or write operations to occur. This method degrades bandwidth to and from the memory system, since a data block may not include all banks of a DRAM module. For example, if a data block spans banks A, B and C, and a second data block spans banks B, C and D, this method would require reading or writing the first data block completely, then reading or writing the second data block completely.
FIG. 1 shows examples of various sizes of data blocks and how they are mapped to the addresses and banks of a four-bank DRAM module. As noted above in the definitions, each data block has a unique size defined by its height and width. Because different types of information may be stored in the memory system, data blocks are different sizes. Some data blocks are limited to one bank and one address, while other data blocks span several banks and several addresses. A data block having a width greater than 1 must be stored in multiple banks within a DRAM module. Similarly, if the data block has a height greater than 1, it occupies more than one address within a bank.
Data block 106 is stored in a DRAM memory module 110. It has a height of 3 and a width of 1; thus, it occupies three consecutive memory addresses 1, 2 and 3. In addition, only bank A is included in this data block. Data block 107 is stored in memory module 112; it has a height of 2 and a width of 4. It occupies two consecutive memory addresses 12 and 13, and spans all memory banks (A, B, C and D).