In communication networks, many devices, such as routers, switches, modems, network interface cards and other elements, store packets in queues, e.g. packet buffers, and schedule packets from queues by means of queuing devices such as traffic managers, network processors, switching elements and framers. Many prior art devices implement packet buffers using a random access memory (RAM), such as a dynamic random access memory (DRAM).
Common objectives in packet buffer design are to provide a packet buffer fulfilling the requirements of low cost, low power dissipation, high storage capacity and elastic sharing of buffer memory among queues. By elastic sharing is meant that the size of a queue may be permitted to exceed a fixed portion of the total buffer memory if there is sufficient memory space left unused by other queues.
Low cost and low power dissipation is often attained by using a DRAM as main buffer memory. Elastic sharing of memory among queues is attained by using a page-based memory, e.g. a DRAM, in which page-based memories queues allocate new pages on demand and unused pages are kept track of in a free-list.
However, a difficulty in the design of a packet buffer system having a DRAM as the main packet buffer is the random access properties of the DRAM. For example, a DRAM may have an access time of 50 ns. If a buffer system comprising such a DRAM is used in a communication network having a high bit rate of e.g. 10 Gb/s or more, as line-rate, the minimum time between two consecutive accesses to arbitrary addresses in the DRAM exceeds the maximum time permitted between two minimum-size packets transmitted at the line-rate.
In order to provide packet buffers for communication networks having a high bit rate and to overcome the slow access time of a DRAM, fast packet buffers have been designed using fast static random access memories (SRAMs) instead of slow DRAMs. Today, fast SRAMs having access times below 4 ns are available and such SRAMs are suitable for a 40 Gb/s packet buffer. However, SRAMs are small (i.e. have a small buffer capacity), expensive and highly power-consuming. Therefore they are only used in networking components requiring only small packet buffers. If large packet buffers are required, the number of SRAMs required will be high causing the packet buffer to be very expensive and power-consuming.
The article “Designing packet buffers for router linecards” to Iyer et al. IEEE/ACM Transactions on networking, vol. 16, no. 3, June 2008, discloses a packet buffer comprising a hierarchy of SRAM and DRAM in order to provide a packet buffer having the speed of a SRAM and the cost of a DRAM. Data that is likely to be needed soon is held in the fast SRAM, while the rest of the data is held in the slower DRAM. The disclosed packet buffer comprises two SRAM caches. One of the SRAM cache, i.e. the tail SRAM cache, is configured to hold packets at the tail of the each FIFO queue, and the other SRAM cache, i.e. the head SRAM cache, is configured to hold packets at the head of each FIFO queue. The majority of the packets at each queue, i.e. the packets that are neither close to the tail nor to the head of the queue, are held in the slow DRAM. When packets arrive to the packet buffer they are written to the tail SRAM cache. Further, when enough data has arrived for a queue (either from multiple small packets or from a single large packet), but before the tail SRAM cache overflows, the data are gathered together in a large block and written to the DRAM. Similarly, in preparation for when packets need to depart, blocks of packets are read from the DRAM into the head SRAM cache to ensure that packets to be read will be in the head SRAM cache in time for the reading operation.
A drawback with the packet buffer disclosed by Iyer et al. is that it stores the tail and head data in an on-chip SRAM cache. If many queues exist, the required amount of SRAM for the cache will need to be large making it infeasible to fit this SRAM on-chip.
The U.S. Pat. No. 6,470,415 B1 to Starr et al. discloses a device for queuing information. The device combines the speed of SRAM with the low cost and low power consumption of DRAM. The device comprises a first queue and a second queue formed of a combination of SRAM and DRAM storage units. Each of the first and second queues has an SRAM head and an SRAM tail, which can be used as an SRAM FIFO. Further, each of the first and second queues has the ability to queue information in a DRAM body.
If a device, such as a processor, wants to store data in a queue, information regarding that data is sent to a queue manager which manages entries in multiple queues, such as the first and second queues. Data from the processor is entered in the head of the queue, which is composed of SRAM and referred to as a SRAM head. Should the information be needed shortly by the processor, the entry can be directly read from the SRAM head and sent back to the processor. If the entry is not needed shortly, in order to provide room for another entry in the SRAM head, the entry is moved from the SRAM head to the DRAM body. Entries are dequeued to the processor from the queue in a similar fashion as entries are enqueued, with the processor requesting the next entry from the queue and receiving that entry from the SRAM tail. Entries in the DRAM are sequentially moved from the DRAM body to the SRAM tail so that entries are immediately available for dequeuing to the processor.
A drawback with the device disclosed in U.S. Pat. No. 6,470,415 B1 to Starr et al. is that also this device stores the tail and head data in an on-chip SRAM cache. If many queues exist, the required amount of SRAM for the cache will need to be large making it infeasible to fit this SRAM on-chip.
Another method for overcoming the access limitation of the DRAM is to store linked-list elements comprising packet information e.g. a packet length and a next-element pointer in a linked list comprised in an SRAM arranged external of a queuing device. By means of the information in the linked list, data in the DRAM can be accessed much faster than if data comprised in the DRAM were to be accessed without the information in the linked list. One linked-list element corresponds to a piece of packet data in the DRAM, e.g. by direct mapping between a linked-list element and a DRAM memory page. In this case the amount of DRAM memory storing a packet is rounded off to an integer number of pages.
However, a problem with storing the linked list in an SRAM is that the SRAM, even if it can handle higher packet rates than a DRAM, cannot handle packet rates higher than a maximum packet rate given by 1/(memory round-trip time). The memory round-tripe time being defined as the time between two packets which time is given by the time for issuing a read for a linked list element, getting that linked list element from the memory, extracting the pointer and length for the next linked list element, and issuing a read for the next linked list element. Thus the packet rate is limited by the random access time of the SRAM.
Another drawback with the prior art devices is the time that has to lapse between two scheduling decisions taken by the packet scheduler since this time limits the packet rate.
Yet another drawback is low DRAM memory utilization due to packet quantization. For example, if the DRAM page size is 512 bytes and the buffer is filled with 64-byte packets only, the memory utilization is only 64/512=12,5%. A further drawback is the cost of the SRAM.