The present invention relates to flow control in network devices, and more specifically, defines virtual input queueing as a tool to provide systems and methods of static and dynamic flow control in shared memory Ethernet switching devices.
Recent advances in computing and network interface technologies and hardware have created an environment in which single Personal Computers (PCs or workstations) are capable of bursting out data at the capacity of a traditional Local Area Network (LAN). These advances, when coupled with the growing interest in bandwidth-intensive multimedia applications, have served to increase the prominence of new high-speed switching technologies like Asynchronous Transfer Mode (ATM) and, more recently, fully-duplex, switched Ethernet.
An Ethernet switch is a frame switch that handles variable length Ethernet frames. The Ethernet switch is fully duplexed and makes forwarding decisions based on destination addresses contained within Ethernet frame headers. Existing standards provide for up to 100 Mbps link speeds, and an Institute of Electrical and Electronic Engineers Working Group (the IEEE 8.2.3 z Working Group) has specified a standard for 1 Gbps operation (known as Gigabit Ethernet).
Fully duplexed Ethernet switches can be divided into two broad categories based on their memory architecture, which will utilize either an input queued switch or an output queued switch. Most output queued switches are implemented by using a shared memory pool to host output queues. This architecture is referred as shared memory switch architecture. FIG. 1 illustrates a block diagram of an input queue-based Ethernet switch 10 having 1-N input lines 22 connected to corresponding 1-N input ports 20 which electrically connect the 1-N input lines 22 with the Ethernet switch 10. A data frame can flow through any one of the 1-N input ports 20 and into the Ethernet switch 10, and simultaneously enter the Ethernet switch architecture 1-N receive channels 40 which couples each 1-N input port 20 to an input queue 70. The frames are then transferred to and stored by an input queue 70 in the order received. There is an input queue 70 assigned to each input port (for a total of N input queues).
1-N transmit channels 60 are in electrical connection with each of the 1-N input queues 70, and each of the 1-N transmit channels is in communication with an 1-N output port. Thus, each input queue 70 stores the received frames on an input-port basis until the output port can send the frames downstream, whereby the frames are transmitted to the appropriate 1-N transmit channel 60, such that each 1-N transmit channel 60 communicate with a corresponding 1-N output port 30. 1-N output lines 32 are connected to the 1-N output ports 30, and the 1-N lines 32 provide the path for the data frames to travel from the appropriate 1-N transmit channels 60 to the correct downstream ports. Input queue-based Ethernet switches can monitor and direct the flow of traffic on a port-by-port basis. However, input queue-based Ethernet switches achieve only 58% of the throughput of shared memory Ethernet switches due to limitation of the head-of-line blocking.
FIG. 2 is a block diagram of a shared memory Ethernet switch 10 having 1-N input lines 22 connected to corresponding 1-N input ports 20 which electrically connect the 1-N input lines 22 with the Ethernet switch 10. A data frame can flow through the 1-N input ports 20 into the Ethernet switch 10 and simultaneously enter the switch architecture at 1-N receive channels 40, which couple each of the 1-N input ports 20 to a memory 50, for temporarily storing frames, and for buffering the frames into output queues (not shown). The memory 50 also communicates with the 1-N transmit channels 60. 1-N output lines 32 are coupled between 1-N output ports 30 and the 1-N transmit channels 60, and transfer each data frame from the appropriate transmit channel 60 to various downstream ports. While achieving higher throughput than the input queue-based Ethernet switch, there is no mechanism in a shared memory Ethernet switch that allows for port-based flow control.
In addition, existing shared Carrier Sense Multiple Access/Collision Detection (CSMA/CD) networks are capable of gracefully handling periods of temporary congestion in bridges and routers through the use of collisions and random back-off mechanisms. However, in a point-to-point full-duplex (non-shared) Ethernet LAN switch, CSMA/DC methods of congestion control are no longer available. Thus, in an Ethernet switch, periods of congestion result in switch buffer overflows and frame losses.
Specifically, network congestion caused by overloading an Ethernet switch is one of the new challenges associated with fully duplexed Ethernet switches. Overload occurs in an Ethernet switch when the switch is receiving more frames than it can direct. Ethernet switches are equipped with buffering capability to accommodate congestion over a short time period. However, if the overload condition persists, the switch buffer will become full, causing switch to discard frame. This is referred as congestion.
Standardization efforts for full-duplex operation of Ethernet (switched Ethernet) have focused attention on the need for flow control at MAC (Media Access Control) sublayer. In the IEEE 802.3x standard, an optional MAC Control Sublayer (MAC Control) has been defined. The scheme is intended to provide vehicle for flow control on a hop-by-hop basis by allowing a port to xe2x80x9cturn offxe2x80x9d or xe2x80x9cturn onxe2x80x9d the transmitter of the upstream device for certain period of time. The basic vehicle for transmitting flow control information from one Ethernet port to the upstream device is a MAC Control frame, a special MAC frame. Control frames are of minimum legal size (64 bytes). The MAC Control Opcode field of the MAC Control frame specifies a Pause opcode. The MAC Control Parameters defines Pause Time which indicates the amount of time for which the upstream link transmitter should stop transmitting data frames. Upon receiving a new MAC Control Frame, the port of the upstream device will stop transmission for a time period specified in the new MAC Control Frame regardless of the previous MAC Control Frame. Conversely, sending a MAC Control Frame with Pause Time being set to zero will xe2x80x9cturn onxe2x80x9d a paused link.
The IEEE 802.3x flow control was developed with the input queued switch architecture in mind. Implementing flow control in an input queued Ethernet switch is straightforward as buffer occupancy of an input queue provides a good indication of overload status on the port. However, this is not the case for the shared memory switch. FIG. 3 graphically illustrates the problem. In FIG. 3, switch memory is represented on the vertical axis, the total available switch memory is represented as M and time is represented on the horizontal axis. Also, let the buffer occupancy of the output queue of output port A be represented by dashed-dot line and the total buffer occupancy be represented by the solid line.
At time t0/ta, a input port A receives frames requiring the use of ma memory, while at time tb input port B receives frames requiring the use of mb memory, resulting in a total memory use of ma+mb. At time tc, input port receives frames requiring the use of mc memory. At a time t1, input port C begins receiving more and more frames of data, and by time t2, input port C is receiving data at a rate that causes the total switch memory demanded (ma+mb+mc) to exceed the available memory, M. Since shared memory Ethernet devices do not monitor traffic on the input ports, to deal with this situation, the switch is forced to pause all upstream devices. For simplicity, the pause is shown as being instantaneously implemented at t2. As the frames in the switch memory exit the switch downstream, the total memory used by the switch decreases until the pause is over and frames are again introduced into the switch, here shown at time t3.
Thus, during to t0 to t2, the memory used by the output queue of output port A increases while the memory used by other output ports is kept small, indicating that output port A is overloaded due to excessive frames destined to output port A, while at the same time other ports (input and output) may still handle frames destined to them. If output port A uses the MAC Control frame to pause transmission of upstream devices, it has to send the xe2x80x9cpausexe2x80x9d MAC Control Frames from all the input ports to the input ports upstream devices, as frames arriving on any port may be destined to output port A. When the switch pauses all the upstream devices during t2 to t3, the total buffer occupancy decreases. During this period, the switch performance is impaired as all other output ports (except port A) do not have frames to send. As the frames in the switch memory exit the switch downstream, the total memory used by the switch decreases until the pause is over and frames are again introduced into the switch, here shown at time t3.
As can be seen from FIG. 3, in this example, only input port C exceeded a reasonable rate of frame transfer, and thus, it was only necessary to pause the upstream device feeding input port C. However, in shared memory devices it is impossible to tell which input port a frame arrived in, and thus all upstream devices must be paused. This means data transfers are interrupted on the upstream devices feeding frames to input ports A and B, as well as input port C. When transmissions are resumed, retransmissions of the data that should have been receive between time t2 and t3 (frame losses) may have to be requested. Thus, overflows and frame losses often require retransmissions that take up time and degrade network performance.
As can also be seen from FIG. 3, statistics of the output queues do not provide much useful information to identify which input port is receiving excessive frames destined to the overloaded output queue. Blindly pausing all ports when an output queue is overloaded will block frames which otherwise can be handled by the switch.
Flow control schemes for Ethernet switches have to use as little state information as possible for easy implementation because Ethernet switches have to be cost competitive and the current provision of the flow control vehicle does not allow fine level control. Furthermore, it is very desirable that the schemes maintain loss free transmission in Ethernet switches.
In order to minimize the loss of frames and data frames, it would be advantageous to provide a flow control tool for shared memory fully-duplexed Ethernet switches. It would also be advantageous to provide systems and methods of using such a tool to maximize system simplicity and efficiency. The present invention provides such a tool, such systems and corresponding methods.
The present invention provides a system for monitoring frame traffic at the input port an Ethernet switch having shared memory and an input port for coupling an input line to the shared memory. To accomplish the input port monitoring, a virtual input queue disposed in the Ethernet switch is used.
The system may further comprise a receive channel coupled between the input port and the shared memory, an output port coupled between the shared memory and an output line, a transmit channel that communicates with the shared memory and the output port, or all of the above. The virtual input queue is preferably in communication with both the receive channel and the transmit channel. In addition, this configuration accommodates an Ethernet switch which has a plurality of receive channels, a plurality of virtual input queues, and a plurality of transmit channels.
The present invention also provides a method of monitoring frame traffic at an input port in an Ethernet switch having a shared memory. The method assigns a virtual input queue to monitor the input port, monitors the input port (for detecting the arrival of a frame at the input port), and increments the virtual input queue when the arrival of the frame is detected. The method may further comprise the steps of monitoring an output port, for detecting the departure of the frame whose arrival was previously detected, and decrementing the virtual input queue when the departure of the frame is detected.
The incrementing step may increase the value of the virtual input queue an amount based on the size of the frame. Likewise, the decrementing step may decrease the value of the virtual input queue an amount based on the size of the frame.
The present invention is also a method of static memory allocation in an Ethernet switch having a shared memory. The method of static memory allocation partitions the shared memory among a plurality of virtual input queues to create a memory value for a virtual input queue which monitors an input port, increments the virtual input queue when a frame arrives at the input port, and pauses an upstream device if the incremented value of the virtual input queue exceeds the memory value for the virtual input queue.
The partitioning step may divide the memory evenly among the plurality of virtual input queues, or may divide the memory among the virtual input queues proportionately to the data rate of each of the plurality of virtual input queues.
The pausing step may pause the upstream device for a predetermined period of time, or for a predetermined number of data transfers. The static memory allocation method may monitoring an output port and decrementing the value of the input queue when the frame departs the Ethernet switch. Furthermore, the static memory allocation method may activate the upstream device when the value of the virtual input queue is less than the memory value.
Accordingly, the static memory allocation method may take the form or the following steps: partitioning the shared memory among a plurality of virtual input queues to create a memory value for a virtual input queue, monitoring frame traffic at an input port with the virtual input queue, detecting the arrival of a frame at the input port, incrementing the virtual input queue, testing whether or not the incremented value of the virtual input queue exceeds the memory value for the virtual input queue, and pausing an upstream device if the incremented value of the virtual input queue exceeds the memory value for the virtual input queue.
The present invention is also a method of dynamic memory allocation in an Ethernet switch having a shared memory. The method partitions the shared memory among a plurality of virtual input queues to create a first set of memory values such that a virtual input queue which monitors an input port has a memory value. Then, the method increments the virtual input queue when a frame arrives at the input port, and, after a control epoch, repartitions the shared memory among the plurality of virtual input queues to update the memory value of the virtual input queue, such that the second set of memory values is based on the data rate of the input port.
The dynamic memory allocation method may pause an upstream device connected to the input port if the incremented value of the virtual input queue exceeds the memory value for the virtual input queue. Furthermore, the partitioning step may divide the memory evenly among the plurality of virtual input queues, or proportionately to the data rate of each of the plurality of virtual input queues.
The repartitioning step of the dynamic memory allocation method typically classifies data rates into a plurality of data rate zones, such as an underutilization zone, a normal zone, and an overutilization zone. An underutilization threshold is typically defined as a boundary between the underutilization zone and the normal zone, and an overutilization threshold is typically defined as a boundary between the normal zone and the overutilization zone. Note that the underutilization threshold is a lower data rate than the overutilization threshold.
The repartitioning step may assign the virtual input queue a minimum memory if the virtual input queue value is less than the underutilization threshold, and a normal memory if the virtual input queue value is at least equal to the underutilization threshold. Likewise, if the virtual input queue is at a level equal to or higher than the overutilization threshold, the repartitioning step assigns the virtual input queue a proportioned memory, such that the proportioned memory is the amount allocated from the shared memory which is not assigned to other virtual input queues. In this way, all excess memory is allocated to the queues monitoring the highest levels of activity. The minimum memory may comprise a data latency memory portion. The underutilization threshold is often set to three maximum data frame sizes lower than the overutilization threshold.
The proportioned memory value is the value may be obtained by subtracting the memory allocated to virtual input queues having minimum memory, the memory allocated to virtual input queues having normal memory, and the latency memory requirements from total memory, and then distributing the difference to the remaining virtual input queues based on data transfer rates.
The pausing step may pause an upstream device if its virtual input queue value exceeds a proportioned memory value, if its virtual input queue value exceeds an overutilization threshold value, or if its virtual input queue value exceeds an underutilization threshold value, depending on the current status of the upstream devices feeding frames to the input ports.
The pausing step may pause the upstream device for a predetermined number of data transfers, or a predetermined amount of time. In addition, the dynamic memory allocation method may monitor an output port and decrement the value of the input queue when a frame departs the Ethernet switch. The upstream device may be activated when the value of the virtual input queue is less than the memory value.