In communication networks today, store-and-forward devices, such as packet switches and routers, support throughputs as high as tens of Gigabits per second per port. A key operation in such store-and-forward devices is the queuing of incoming data into memory, followed by the subsequent de-queuing of the data, before sending to its destination. In a high-speed switch or router, the queuing operation can be implemented in hardware, including digital logic, such as an application specific integrated circuit (ASIC), or a field-programmable gate array (FPGA), in conjunction with semiconductor memory that holds the packet data and control information for the queues.
Many packet-based network protocols send data and control messages in packets that may be as small as 40 bytes in size. The router or switch must be designed to deal with the smallest-size packet generated by these protocols to maintain full throughput. Each of the packets passing through the switch or router may need to be queued into an associated queue, or may need to be de-queued from the queue. This places stringent demands on the performance of the switch or router. For example, to support a throughput of 10 Gigabits/second per port, the time interval between the arrivals of consecutive 40-byte packets is only 32 nanoseconds (32×109 seconds). Therefore, the switching system should be designed to support one queuing operation and one de-queuing operation within 32 nanoseconds.
The processing needed to queue a packet includes the following basic operations. First, the queue number associated with the incoming packet is determined from identifying information present in the header of the packet. The control information for that particular queue is then read from a control memory, using the queue number as the index. The control information is then used to link the incoming packet to the linked list corresponding to the queue. The control information is modified to reflect the addition of the new packet. Finally, the updated control information needs to be written back into the control memory.
The processing operations needed for de-queuing a packet from the head position of a specific queue are similar. As before, it involves reading the control information from control memory, un-linking the packet (resulting in the modification of the control information), and then writing back the updated control information.
To achieve full throughput in a high-speed switch or router, the operations associated with the queuing and de-queuing operations are often executed in a pipeline, so that one queuing and de-queuing operation can be initiated in every clock cycle. Modern memory technologies, such as the quad data rate (QDR) family of static random-access memories (SRAMs), support such pipelined operation. QDR memory devices have two data ports, one for reads and the other for writes, which enable a read and a write operation to be performed in parallel. Each port also operates in a DDR (double data rate) fashion, transferring two words of data in every cycle of the memory clock.
Although the pipeline memory devices such as QDR support very high throughputs, they have long latencies. That is, a read operation must wait for several clock cycles from starting the operation before data becomes available from the device. Similarly, a write operation takes several cycles for the data to be updated in memory. This long latency may be the result of pipelining within the memory device, or pipeline stages introduced for tolerating the delay in the data path between the memory and processing logic, or both. The pipeline allows a new operation to be started every cycle when the new operation does not depend on the results of any of the pending operations already in the pipeline. When two operations are dependent, however, starting one of the operations without completing the previous one can lead to inconsistency of the queue state and data corruption. To avoid inconsistency in the queue state, a queuing or de-queuing operation acting on a specific queue must wait until the previous operation on the same queue has been completed. This results in long delays and reduced throughput when multiple operations (for example, a queuing followed by a de-queuing) take place on the same queue close together in time: The second operation must wait for the full latency of the memory device after starting the first operation.