1. Field of the Invention
The present invention relates to network devices that allow for data to be routed and moved in data communication or computer networks. More specifically, the present invention provides for a packet buffer implemented through balanced linked lists that allow for high performance and a method of using such a packet buffer to maximize the utilization of the data storage memory.
2. Description of Related Art
As computer performance has increased in recent years, the demands on data communication and computer networks has significantly increased; faster computer processors and higher memory capabilities need networks with high bandwidth capabilities to enable high speed transfer of significant amounts of data. One aspect of the network devices used in computer networks is the temporary storage and retrieval of information passing through the network. Such data storage mechanisms are used in network switches or routers, packet or frame processing devices, or network traffic management devices. Most of these applications require large data storage memories that are implemented in semiconductor devices. These memories are often operated as a data queue, which store and forward in the order that data comes into the device.
A simple but widely employed mechanism of storing data packets in a queue is to use a First-In First-Out (FIFO), which can be implemented using a Static Random-Access Memory (SRAM) with a set of read and write address pointers. Such a structure 100 is illustrated in FIG. 1a. An advantage of using a FIFO is its easy manipulation of writing, reading, deleting a sequence of data to or from the memory. However, when many types of packet data have to be stored in a single structure, that can impede the ability to store and retrieve that data effectively.
In many communications devices, different data packets have different requirements of quality of service (QoS) or class of service (CoS). In other words, the different data packets need to be stored in different queues, and forwarded by different scheduling methods to meet their required bandwidth or latency. Such a system is illustrated in FIG. 1b. Packet data is received and stored in the input FIFOs 101-110. The packet data is transferred 120 to output FIFOs 121-130 based on data contained in the packet data and eventually sent out 140 to their forwarding destinations.
When a large number queues is required, the FIFO based mechanism partitions the memory or uses a separate memory for each queue (Q). In the case of network devices, data packets having different destination ports should to be stored in different locations; thus the number of queues required is often the number of destination ports times the number of CoS queues in addition to the number of source ports or source channels. In such applications, the FIFO-based packet memories are very inefficient in their memory utilization, because each queue tends to be very small. Often times, a queue can overflow even if there is plenty of space available in other queues.
To avoid the problem of small queue size or low utilization of memory, a more complex mechanism, a linked list, is often employed. An embodiment of a linked list system is illustrated in FIG. 2a. An advantage of linked lists over FIFOs is that it allows one queue to occupy the entire memory by allowing all the queues to share a single memory 200. A linked list 201-210 stores a data packet in a sequence of memory elements, where each element stores a data portion of a fixed size, and all the elements are linked sequentially by means of pointers. The addresses or pointers of all the memory elements are initially stored in a linked list called a Free Pointer Queue (FreeQ) 215. Each free or unused pointer is used one at a time to form a linked list of a packet by storing each portion of a data packet to the memory element pointed by its pointer. When the data packet is forwarded, its memory elements are read out from the memory one at a time in the order the element is linked up in the linked list 220, and the pointers of the elements are freed and returned to the FreeQ.
One of the drawbacks of linked list mechanisms, however, is a long delay between accesses of two consecutive elements of a linked list. This drawback often prevents high-performance packet buffer designs from using linked list mechanisms. Every time storing a new element of a packet data, they retrieve each free pointer from the free Q. This is shown schematically in FIG. 2b, where a control logic circuit 230 is used to access an address register 240 to read from a pointer memory 250 configured in an asynchronous SRAM. The pointer obtained from the pointer memory is used as the address for reading the next pointer in the memory 250, where the access is recorded in the read data register 260.
If they need to store more than one new element of a packet data, however, it takes them a certain memory read access time after retrieving one free pointer before the next free pointer can be retrieved. This is shown in FIG. 2b with a read latency of 2 clock cycles. This latency of two cycles leads to a low throughput of one pointer per two clock cycles. This is due to the fact that the address of the next element in a linked list is the contents of the previous element of the linked list, so the next element cannot be retrieved until the previous element is completely read out from the memory and goes through certain analyses and processes. The read latency is usually two or more clock cycles due to the following fact. Most high-performance pipelined designs require input registers and output pipeline registers of memories for timing and testing purposes. Also high speed synchronous SRAMs 270 are increasingly used, which incorporate an internal pipeline register and so require additional clock cycle for read accesses, as shown in FIG. 2c. 
When there are input and output registers of a memory, even if the memory allows asynchronous read, which takes less than one clock cycle to read data, the delay between two consecutive read accesses to the free Q is at least two cycles, resulting in a low throughput of pointer memory access. Therefore, conventional linked list mechanisms can receive and properly store only one new element of packet data every two or more clock cycles.
As such, there is a need for a method or mechanism that can be used by a network device to increase the pointer memory throughput. There is also a need for a mechanism or method that maximizes the performance and utilization of the data storage memory.
It is an object of this invention to overcome the drawbacks of the above-described conventional network devices and methods. The present invention relates to the storing and retrieving of data arriving at a communications device, such as a network switch that directs the flow of that data, and a frame processor that classifies, filters, or modifies the data. More specifically, the architecture and apparatus allows for data frames of any size to be stored in a large number of queues, forwarded to a destination device, or removed from the storage instantly in a way that maximizes the performance and utilization of the data storage memory.
According to one aspect of this invention, a process of handling packet data in a packet buffer is disclosed. When an initial portion of a data packet is received, one or more free pointer is read from a prefetch memory structure in each clock cycle. The one or more free pointer is stored in a start pointer register connoting a start of the data packet. When the last portion of the packet is received, no new free pointer is used, but the first free pointer stored in the start pointer register is linked with an end of packet portion of a previously received packet in an output queue memory. Portions of the data packet are read out, or the entire packet is dropped, from the output queue memory when selected by a transmission scheduler. The more than one free pointer is released to a plurality of free pointer queue memories, wherein the plurality of free pointer queue memories supply free pointers to the prefetch memory structure.
Alternatively, the one or more free pointer can be released to at least one free pointer queue memory of the plurality of free pointer queue memories having a shortest length of free pointers. Also, free pointers can be provided to the prefetch memory structure from the plurality of free pointer queue memories in advance of those pointers being fetched from the prefetch memory structure. The process of providing free pointers to the prefetch memory structure can be performed such that the throughput of the read access to the free pointer queue memories is once every clock cycle of a clock used to maintain the timing of the function of the packet buffer whose read latency is greater than one clock cycle. Additionally, the portions of the data packet may be dropped when the transmission scheduler indicates that the data packet should be dropped and the entire process of releasing the more than one free pointer to a plurality of free pointer queue memories is performed within one cycle. The process may also be performed in a network device that receives, handles and transmits or drops a data packet.
According to another embodiment of this invention, a packet buffer for handling packet data is disclosed. The packet buffer includes receiving means for receiving an initial portion of a data packet, first reading means for reading one or more free pointers from a prefetch memory structure and storing means for storing one of the more than one free pointer in a start pointer register connoting a start of the data packet, configured to store subsequent portions of the data packet until an end of packet portion is reached. The packet buffer also includes linking means for linking the one of the more than one free pointer with an end of packet portion of a previously received packet in an output queue memory, second reading means for reading out portions of the data packet from the output queue memory when selected by a transmission scheduler and releasing means for releasing the more than one free pointer to a plurality of free pointer queue memories. The plurality of free pointer queue memories are configured to supply more than one free pointers to the prefetch memory.
In another embodiment, a packet buffer for handling packet data is disclosed. The buffer includes a plurality of free pointer queue memories, configured to supply more than one free pointers and a prefetch memory structure, configured to supply one or more free pointer. The buffer also has a plurality of assembly linked lists, configured to store one of the more than one free pointer in a start pointer register connoting a start of the data packet, and configured to store subsequent portions of the data packet until an end of packet portion is reached and at least one output queue memory; configured to link the first free pointer with an end of packet portion of a previously received packet. The packet buffer is configured to read out portions of the data packet from the output queue memory when selected by a transmission scheduler and to release the more than one free pointer to a plurality of free pointer queue memories.
These and other objects of the present invention will be described in or be apparent from the following description of the preferred embodiments.