1. Field of the Invention
The present invention relates generally to packet switching systems and methods, and ore specifically to a shared buffer architecture for packet switching devices.
2. Description of the Prior Art
A wide variety of architectures may be employed in the design of packet switching devices and packet switching fabrics. Examples of common packet switching architectures include cross-bar architectures, ring topology architectures, and shared buffer architectures. Each of the different types of architectures provides different advantages for use in different types of networks. Traditionally, the shared buffer switching architecture has been used in networks supporting the propagation of fixed length packets, commonly referred to as cells. Packet switching devices designed in accordance with conventional shared buffer architectures provide peak bandwidth performance when designed specifically to switch cells of a predetermined length as further explained below. For example, shared buffer switching devices used in asynchronous transfer mode (ATM) networks are typically designed to provide optimal utilization of memory space of the shared buffer, as well as optimal bandwidth performance in an ATM network wherein the cell size is fixed at 53 bytes. Although conventional shared buffer packet switching devices may be used for switching packets of varying lengths, the bandwidth performance of shared buffer switching devices suffers when switching variable length packets because a large amount of memory space of the shared buffer is wasted as further explained below.
FIG. 1 shows a schematic block diagram of a conventional shared buffer packet switching device at 10 which is commonly employed in networks supporting the propagation of cells (e.g., an ATM network). The device 10 includes: a plurality of N serial receive ports 12 designated RX0, RX1, RX2, . . . , RXN−1 providing serial reception of bits of cells received via associated links (not shown) of a network; and a plurality of N serial transmission ports 14 designated TX1, TX2, TX3, . . . TXN−1 providing serial transmission of bits of cells via associated links of the network. The serial receive ports RX0, RX1, RX2, . . . , RXN−1 and associated ones of the serial transmission ports TX1, TX2, TX3, . . . TXN−1 are typically formed by bi-directional network ports communicatively coupled with associated network links.
The shared buffer switching device 10 further includes: a source managing unit 18 having a plurality of N ports 20 each for receiving cells from an associated one of the receive ports 12 via an associated one of a plurality of N receive buffers 22; a shared buffer 26 having a port 28 communicatively coupled with the source managing unit 18 via a bus 30 as further explained below; and a destination managing unit 34 having a plurality of N ports 36 each being communicatively coupled with an associated one of the transmission ports 14 of the device via an associated one of a plurality of N transmit buffer queues 38. Typically, the shared buffer 26 is implemented using static random access memory (SRAM) technology, and is addressable by the source managing unit 18 and destination managing unit 34 via memory address values as further explained below.
The source managing unit 18 includes: a packet forwarding module 50 for receiving cells from each of the receive buffers 22 via a bus 54, and a port 56 as further explained below; and a buffer managing unit 60 having a port 62 communicatively coupled with each of the receive buffers 22 via the bus 54, and with port 52 of the packet forwarding module 50 via the bus 54, a port 64 communicatively coupled with port 28 of the shared buffer 26 via the memory bus 30, a port 66 communicatively coupled with port 56 of the packet forwarding module, and a port 68 communicatively coupled with port 42 of the destination managing unit 34. Operation of the device 10 is further explained below.
FIG. 2 shows a generalized table diagram illustrating a memory space at 72 of the shared buffer 26 (FIG. 1). The memory space 72 includes a plurality of word locations 74 of the shared buffer memory space, each word location being addressable via a corresponding memory address value 76, and having a word storage space 78 for storing an associated word of data having a word length of B bits. The shared buffer 26 (FIG. 1) is said to have a“width” of B bits, and a “height” equal to the total number of addressable word locations 74. As further explained below, because hardware requirements dictate that the shared buffer 26 have a fixed word length, or width, a bandwidth problem arises in using a shared buffer memory for switching variable length packets.
Referring back to FIG. 1, in operation of the switching device 10, cells are received serially via associated network links at each one of the receive ports 12 and temporarily stored in the associated receive buffers 22 which are used in converting the received cells from the serial data format to a parallel data format for storage in the shared buffer. The packet forwarding module 50 is responsive to address values (e.g., MAC address values) carried by the received cells, and operative to determine destination port information associated with each of the received cells by reading a cell forwarding table (not shown), the destination port information indicating a destination one of the transmission ports 14 associated with the received cell. The packet forwarding module 50 provides the destination port information associated with each one of the received cells to port 66 of the buffer managing unit 60 via its port 56.
The buffer managing unit 60 is operative to determine a memory address value 76 (FIG. 2) associated with each of the received cells, the associated memory address values indicating word locations 74 (FIG. 2) for storing the received cells. The buffer managing unit 60 is then operative to store (write) the received cells in the associated word locations 74 (FIG. 2), and is also operative to provide the destination information and the memory address values associated with of the each cells to port 42 of the destination managing unit 34 which uses the information to perform output queuing operations.
The destination managing unit 34 receives and temporarily stores the destination information and memory address values associated with each of the cells. The destination managing unit 34 includes output queuing logic (not shown) for arbitrating between requests on behalf of received cells for access to associated destination ones of the transmit buffer queues 38. After resolving requests and selecting a received cell for access to an associated one of the transmit buffer queues 38, the destination managing unit 34 reads the selected cell from the associated word location 74 (FIG. 2) of the shared buffer 26 using the associated memory address value, and forwards the cell to the associated one of the transmit buffer queues 38.
Note that one cycle is required to access, that is read or write, a word of data to the shared buffer 26, and therefore the shared buffer 26 may serve one of the receive ports 12 or one of the transmission ports 14 at a time for writing (storing) and reading (retrieving) cells. The switching device 10 is generally synchronous in that cells are received serially by the receive buffers 22, converted from serial to parallel format, and stored in the shared buffer.
The buffer manager 60 accesses word locations 74 (FIG. 2) of the shared buffer 26 in accordance with allocated times slots associated with each of the receive ports 12, and with each of the transmission ports 14. Typically, the access operations are synchronized in accordance with a write cycle in which the buffer manager 60 stores a cell received by each of the N receive ports 12 during each of N write time slots, and a read cycle in which the buffer manager 60 reads a cell to be transmitted from each of the N transmission ports 14 during each of N read time slots. Any time slot allocated for a receive port which has not received a cell is wasted during an associated write cycle. Likewise, any time slot allocated for a particular transmission port is wasted during an associated read cycle if no cell is to be transmitted from the particular transmission port.
As an example of operation of the switching device 10, assume that the device includes N=4 bi-directional ports. In the present example, consider that RX0 receives a cell determined to be destined for TX1, RX1, receives a first cell determined to be destined for TX2, RX2 does not receive any cells, and RX3 receives a second cell determined to be destined for TX2. In this example, during an associated write cycle, the buffer manager 60 stores the cells received by RX0, RX1, and RX3 in associated ones of the word locations 74 (FIG. 2) of the shared buffer. The allocated time slot for storing a cell received by RX2 is wasted in this case because RX2 did not receive any packets. As mentioned, during the read cycle, the buffer manager 60 reads cells stored in associated ones of the word locations 74 (FIG. 2) which are associated with each of the transmission ports 14. In the present example, during the associated read cycle, the buffer manager 60: wastes a first read time slot associated with TX0 as no packets destined for TX0 have been received; reads the cell destined for TX1 during a second read time slot; reads the first cell destined for TX2 during the third read time slot; and wastes a fourth read time slot associated with TX3 as no packets destined for TX3 have been received. After this round of read and write cycles, the second cell destined for TX2 is left in the shared buffer, and will be retrieved during a subsequent read cycle.
In order for a switching device, of any architectural type, to support N ports each having a line rate, R (defined in units of bits per second), the switching device must provide total switching bandwidth performance equal to NR, that is the product of N and R. The bandwidth performance of the shared buffer switching device 10 is a function of clock rate (which is defined by the time required to access the contents of one word location), and the width, B, of the shared buffer. As mentioned, the bandwidth performance of a shared buffer switching device determines the number N of ports which can be served by the device. Therefore, the number of ports which may be supported by the shared buffer device is also a function of the width, B, of the shared buffer. For the shared buffer switching device 10 to provide a total switching bandwidth performance of NR, the memory bandwidth for accessing the packet buffer must be equal to 2NR, thereby providing a write bandwidth of NR and a read bandwidth of NR.
The number N of ports, of a uniform line rate, which may be supported by the shared buffer switching device 10 may be determined in accordance with Relationship (1), below:N=(Clock_Rate*B)/(2*R)  (1)where Clock_Rate is defined in units of cycles per second, B is the width of the shared buffer 26 in bits, and R is the line rate of each of the ports of the switching device.
As an example, assume that the clock rate of the shared buffer switching device is 125 MHz which provides for accessing (read or writing) the contents of a word location 74 (FIG. 2) of the shared buffer in 8 nanoseconds. Also assume that the width, B, of the shared buffer 26 is 512 bits. This provides a total memory bandwidth of 512 bits per every 8 nanoseconds which is equivalent to 64 Gbits per second. A total memory bandwidth of 64 Gbit/s provides a 32 Gbit per second write bandwidth, and a 32 Gbit per second read bandwidth. Because each port has a line rate of 1 Gbit/s, the switching device 10, having a total memory bandwidth of 64 Gbit/s, can support N=32 ports in this example.
As mentioned above, the shared buffer 26 is implemented using SRAM technology. In practice, the size of the shared buffer 26 may be varied by interconnecting a plurality of commercially available standard size memory units. The width, B, of the shared buffer 26 may be varied by interconnecting a plurality of memory units in parallel, and the height of the shared buffer may be varied by interconnecting a plurality of memory units in series. One commercially available standard size memory unit is 1K×16 bits, that is 1000 words in height and 16 bits wide, and therefore provides for storing one thousand 16-bit words. As an example, thirty two of the 1K×16 bits memory units may be arranged in parallel to form a shared buffer having a width, B, of 512 bits, wherein each word storage unit 78 (FIG. 2) of the packet buffer provides a 512 bit word length.
As mentioned above, because hardware requirements dictate that the shared buffer 26 have a fixed word length, or width, bandwidth performance decreases where a shared buffer memory is used for switching variable length packets. Each of the above calculations of bandwidth performance, based on Relationship (1), assumes an ideal case wherein the entire contents of each word storage unit 78 (FIG. 2) of each word location of the shared buffer 26 is utilized for storing a data packet, or a portion of a data packet. The overall bandwidth performance of the switching device 10 decreases if less than the entire contents of each word storage unit 78 (FIG. 2) of each word location is utilized.
As mentioned above, packet switching devices having a shared buffer architecture have traditionally been used only in networks wherein the data packets are fixed length data packets. The overall bandwidth performance of the switching device 10 is maximized where the width of the shared buffer is equal to the fixed length of the cells being switched. However, packet switching devices having a shared buffer architecture have not been traditionally applied for switching variable length data packets because bandwidth performance suffers in such application. If the length of the packets vary, the bandwidth provided by the switching device 10 is decreased. For example in an Ethernet network, packet lengths vary in a range between 64 bytes and 1522 bytes, each of the packets having an integer number of bytes within the defined range. In a worst case scenario, the bandwidth performance of a packet switching device having a shared buffer architecture is most adversely affected where a received packet has a length which is one byte greater than the width, B, of the packet buffer.
Again assuming the above example wherein the shared buffer 26 (FIG. 1) has a width, B, of 512 bits, or 64 bytes, and wherein the device 10 operates at a clock rate of 125 MHz, the bandwidth provided by the switching device 10 is maximized at 64 Gbits/s if each of the received packets has a fixed length of 64 bytes. Bandwidth performance of the packet switching device 10 may be expressed by the product of the width, B, of the shared buffer 26 and the clock rate of the device only if the full memory width is utilized. However, decreased bandwidth performance of the packet switching device, as well as wasted memory space, occurs in response to receiving a packet having a length slightly greater (e.g., one byte greater) than the width of the shared buffer 26 (FIG. 1). For the present example, a worst case bandwidth performance of the packet switching device occurs where a received packet has a length equal to 65 bytes. This problem arises because the bandwidth performance of the packet switching device 10 is dependent upon the portion each word location of the shared buffer 26 which is actually utilized.
As a packet having a length equal to 65 bytes is received at one of the receive ports 12, the first 64 bytes of the received packet are written to a first one of the word locations 74 (FIG. 2) designated 80, and a last byte of the received packet is written to a second one of the word locations 74 designated 82. Only a very small portion of the word storage space 78 of the second word location 82 is used for storing the last byte of the received packet. The remaining portion of the storage space of the second word location 82 cannot be used for storing a next packet, or a portion of a next packet, because the output queuing logic of the destination managing unit 34 (FIG. 1) of the shared buffer architecture requires that the buffer managing unit 60 provide the destination managing unit with one or more memory address values uniquely identifying each packet stored in the shared buffer. Therefore, each one of the word locations 74 (FIG. 2) of the shared buffer may store data of only one of the received packets so that the memory address value associated with the word location only identifies data of a single stored packet. Because of the wasted storage space of the second word location 82, the bandwidth performance of the switching device suffers.
Assuming a constant clock rate, in order to support greater bandwidth performance of the switch, and a greater number of ports, the width of the shared buffer 26 may be increased. However, there is a practical limit to how much the width of the shared buffer 26 may be increased, and it is not practical to increase the width of the shared buffer 26 to 1522 bytes in order to accommodate the longest Ethernet packet.
When rating the bandwidth performance of a switching device, it is necessary to specify the worst case bandwidth performance of the switching device. In the above example, the worst case bandwidth performance of the packet switching device, occurring when a packet having a length of 65 bytes is received, is 32 Gbits/s which is one half of the maximum bandwidth achieved for 64 byte packets. Therefore, only sixteen 1 Gbit ports may be supported by the prior art switching device 10 for switching variable length data packets in the example presented.