This invention relates to processors, such as network processors that use bank access in a multiple bank dynamic random access memory (DRAM) chip. More particularly, it relates to improvements in network processors that use multiple bank synchronous DRAMs as data buffers.
Digital computers commonly use DRAM chips for the storage of retrievable data. In synchronous DRAMs, the memory is typically organized into several (usually 4) memory banks per memory module within the chip. In this embodiment, each bank holds 16 bytes of data corresponding to one quadword. Thus, the buffer size for 4 banks is 64 bytes of data stored per memory module. These banks provide a comprehensive collection of the addressable storage space in a processing unit used to execute instructions. Logic can access one bank by overlapping with another bank. Thus, if two xe2x80x98readsxe2x80x99 are required, one xe2x80x98readxe2x80x99 command can be executed from a first bank followed by one from a second bank without waiting for the first xe2x80x98readxe2x80x99 to be completed.
Structures called frames contain control and user data that is transferred between the network processor and the DRAM. These frames are of random size and may range from 64 bytes up to 1500 bytes or more. Data is transferred from the frames to the memory banks in chunks or segments of 64 bytes each.
There is an ongoing impetus toward achieving greater access speed and reducing latency in the transfer of data into and out of memory. This is even more evident in a DDR DRAM (Double Data Rate Dynamic Random Access Memory) device wherein the interface between a chip and the DRAM transmits data on both the rising and the falling edges of a cycle clock. One environment in which this problem becomes particularly manifest is in a microprocessor that functions as a network processor (NP).
A problem that adversely affects the ability of a network processor to access multiple memory banks of a DRAM chip is a phenomenon referred to as polarization. This phenomenon occurs most often when each new frame is started in the same bank. For example, when new frames are always started in bank A of a four bank (A, B, C, D) memory module, this tends to underuse bank D because frames are of random length and, therefore, statistically can end in any bank. In addition, the processing of the header (typically 32 bytes) of each frame requires additional access to those banks in which the header is stored. Since the header information typically is stored at the beginning of the frame, banks A and B will be constantly accessed more frequently than banks C and D, thereby adding to the under utilization of banks C and D. This can be exemplified by visualizing the writing of several frames, each containing 72 bytes of data. If they are written into memory according to conventional practice, 16 bytes would be loaded sequentially into banks A, B, C, D, and A in that order. Thus, bank A gets used twice while the other banks get used once, contributing to a waste of bandwidth. If there are multiple frames of this size, the problem is further exacerbated each time an additional frame is loaded.
In network processor (NP) designs that are not limited by bandwidth (BW) requirements, memory bank polarization is not a concern because the latency associated with this phenomenon can be tolerated. High performance processors cannot tolerate this latency if they are to satisfy high bandwidth requirement (i.e. 10 Gbps). Although this problem did not require a solution in previous generations of network processors, a solution to this problem for high bandwidth network processors will help alleviate system latencies, thereby increasing performance. Greater speeds have been achieved by overlapping the bank accesses whereby one xe2x80x98readxe2x80x99 or one xe2x80x98writexe2x80x99 is sent to bank A and another xe2x80x98readxe2x80x990 or xe2x80x98writexe2x80x99 is sent to bank B. This second bank is programmed to start operation before bank A has completed its transfer. Greater bandwidth would be achievable by increasing the degree of overlap.
The present invention relates to a method of randomly choosing the memory bank in which a new frame is begun to be stored, thereby minimizing DRAM polarization latency effects in a network processor. The method comprises the steps of a) receiving a frame from a network processor, said frame consisting of one or more segments; b) randomly assigning the segments of the frame to one of a plurality of banks within a memory module, and c) storing the first data byte pointer and the last data byte pointer of the frame segments in the memory module. Each segment typically contains up to 64 bytes of data and has a first or start data byte pointer (SBP) and a last or end data byte pointer (EBP). The segments are randomly assigned by quadword rotation, performed within a FIFO. Accordingly, the first data byte pointer (SBP) is determined according to the formula SBP=[(FRxc3x971.6)+RP]mod 64, and the last data byte pointer (EBP) is determined according to the formula EBP=[(FRxc3x9716)+(IBCxe2x88x921)+RP]mod64. In this formula, FR=a Quadword Rotation value between 0 and 3; RP=the Relative Position of incoming data related to a given buffer; IBC=the Incoming Byte Count, and xe2x80x98mod 64xe2x80x99means that the summation is divided by 64, with the remainder being kept and the integer being disregarded.
The invention also includes a structure for minimizing DRAM polarization latency effects in a network processor. The structure includes means for receiving a frame from a network processor, said frame consisting of one or more segments, each segment typically being 64 bytes in size. Each segment has a start data byte pointer (SBP) and a last or end data byte pointer (EBP). Means are included for randomly assigning the segments of the frame to one of a plurality of banks within a memory module. Further means are included for storing the first data byte pointer and the last data byte pointer of the segments of the frame in the memory module. The segments are randomly assigned by quadword rotation within a FIFO. The first data byte pointer (SBP) is determined according to the formula SBP=[(FRxc3x9716)+RP]mod 64, and the last data byte pointer (EBP) is determined according to the formula EBP=[(FRxc3x9716)+(IBCxe2x88x921)+RP]mod64.
The invention also relates to a network processing system and a method of writing multiple frames of data from a network processor into a memory bank of a DRAM. Frames of data are transferred into a FIFO as they are received by the network processor. Each frame is separated into segments of 64 bytes to match the size of one access to each memory module within the DRAM. The frames are delivered to banks within the DRAM on a random basis, whereby the start of frames is evenly distributed between the banks. The randomized delivery of segments is performed by quadword rotation within a FIFO buffer. The first data byte pointer (SBP) is determined according to the formula SBP=[(FRxc3x9716)+RP]mod 64, and the last data byte pointer (EBP) is determined according to the formula EBP=[(FRxc3x9716)+(IBCxe2x88x921)+RP]mod64.
The invention also relates to a system and a method of randomly selecting a memory bank among multiple banks in a DRAM module in which to begin to write data from a frame. This involves a step of, or means for, dividing the data into segments, each of which has a start data byte pointer (SBP) and an end data byte pointer (EBP). This is then followed by rotating the segments by using the formula SBP=[(FRxc3x9716)+RP]mod 64, and using the formula EBP=[(FRxc3x9716)+(IBCxe2x88x921)+RP]mod64 for determining the relative position of the incoming data relative to the memory banks. In these equations, FR represents a Quadword Rotation value between 0 and 3, RP represents the Relative Position of incoming data related to a given buffer, IBC is the Incoming Byte Count, and xe2x80x98mod 64xe2x80x99 means that the summation is divided by 64, with the remainder being kept and the integer being disregarded.
The quadword rotation of the present invention applies to the xe2x80x98writexe2x80x99 of data into the DRAM buffer or module. When the data is xe2x80x98readxe2x80x99 out of the DRAM buffer, the same rotation must be followed as that used for the xe2x80x98writexe2x80x99 command, to ensure that the data is read in the proper sequence.