Increasingly so, the world relies on data communication over networks of computers wherein two or more computer systems are connected and communicate with each other. There are large networks that span the globe, such as the Internet, and there are smaller networks, such as a between adjacent desks in the same room. There are many types of networks but some of the most common classifications of network are based on the distances between the computers or microprocessor driven devices in the network. A home-area network (HAN) is a network contained within a user's home that connects a person's digital devices. A local-area network (LAN) is one wherein the computers are geographically close together, for instance, in the same building. Most LANs connect workstations and personal computers. Each individual computer in a LAN may have its own CPU with which it executes programs, but it also is able to access data and devices anywhere on the LAN. This means that many users can share expensive devices, such as laser printers, storage resources, software applications, etc. Users also use the LAN to communicate with each other and share data. Storage area networks (SAN) are those networks connected to massive digital data storage banks. A campus-area network (CAN) is where the computers are within a limited geographic area, such as a campus or military base. A metropolitan-area network (MAN) is a data network designed for a town or city. A wide-area network (WAN) is where the computers are farther apart and are connected by telephone lines or radio waves.
Not only the distance, but also the following characteristics are used to categorize different types of networks. Computers on a network are sometimes called nodes. Computers and devices that allocate resources for a network are called servers. The topology refers to the geometric arrangement of a computer system, such as a bus, star, and ring. A bus topology is like a long street with computers or nodes each have access to the server. A star is like a cul-de-sac where the nodes access the center, like a hub. A ring topology is where data flows through the nodes between the source and destination. Protocol defines a common set of rules and signals that computers on the network use to communicate; one of the most popular protocols for LANs is called Ethernet. Architecture can use either a peer-to-peer architecture wherein each node has direct access to another node or client/server architecture wherein communication is through a server than to another node. Networks vary from one another by the media connecting the node, such as twisted-pair wire, coaxial cables, fiber optic cables, wireless, satellite, etc.
A challenging problem in network is the delay effects. The time to read data and to send a control signal to a transmitter through the network depends on the characteristics of the network as above. The overall performance of a networked system can be significantly affected by network delays, and the severity of the problem is aggravated when data loss occurs during a transmission. Delays not only degrade the performance of a network, but also can destabilize the network.
One mode of data transmission is a full-duplex mode that allows two nodes to simultaneously exchange data over a point to point link, or peer-to-peer link, connecting exactly two stations to avoid contention for the medium. Full duplex further provides independent transmit and receive paths. Because each node can simultaneously transmit and receive data, the aggregate throughput of the link is effectively doubled. Thus, a 10 megabit per second (Mbps) node operating in full-duplex mode provides a maximum bandwidth of 20 Mbps, i.e., 10 Mbps going out and 10 Mbps coming in. Full-duplex operation requires a physical medium capable of supporting simultaneous transmission and reception without interference.
The addition of full-duplex mode to the Ethernet protocol standard includes an optional flow control operation known as PAUSE frames. PAUSE frames permit one end station to temporarily stop all traffic from the other end station, except for certain control frames. For example, a full-duplex link connects two devices called “A” and “B”. Suppose A transmits frames at a rate faster than B can either process or receive them because B has no remaining buffer space to receive additional frames. B now transmits a PAUSE frame to A requesting that A stop transmitting frames for a specified period of time. Upon receiving the PAUSE frame, A suspends further frame transmission until the specified time period has elapsed to allow B time to catch up and/or recover from the congestion state. At the end of the specified time period, A resumes normal transmission of data frames. Note that the PAUSE frame protocol is bidirectional; A may send frames to pause B, and B may send frames to pause A. A PAUSE frame is the one type of frame that a node is allowed to send even if it is currently in the paused state.
The format of a PAUSE frame conforms to the standard Ethernet frame format and includes a unique type field and other parameters. The destination address of the frame may be set to either a unique node, or to the globally assigned multicast address reserved by the IEEE 802.3 standard for use in MAC Control PAUSE frames and the IEEE 802.1D bridging standard as an address that will not be forwarded. This ensures the frame will not propagate beyond the local link segment. The Type field of the PAUSE frame indicates the frame is a MAC Control frame. The MAC Control opcode field indicates the type of MAC Control frame being used is a PAUSE frame; it being the only type of MAC Control frame currently defined. The MAC Control Parameters field contains a 12-bit value that specifies the duration of the PAUSE event in units of 512-bit times. If an additional PAUSE frame arrives before the current PAUSE time has expired, its parameter replaces the current PAUSE time, so a PAUSE frame with parameter zero allows traffic to resume immediately. A 42-byte reserved field (transmitted as all zeros) is required to pad the length of the PAUSE frame to the minimum Ethernet frame size.
PreambleStartDest. MACSourceLength/TypeMACMACReservedFrame(7-bytes)FrameAddress (6-MAC(2-bytes) =ControlControl(42-CheckDelimiterbytes) =Address802.3OpcodeParametersbytes) =Sequence(1-byte)(01-80-C2-(6-MAC(2-bytes) =(2-bytes) =all(4-bytes)00-00-01)bytes)ControlPAUSE(00-00 tozerosor unique(88-08)(00-01)FF-FF)DA
Predominantly, in the 10 Gigabit Ethernet protocol, network processor units/framers//MACs/etc., use either SPI4.2, SPI4.1/CSIX, or SFI interfaces to go between chips. SPI4.2 interfaces are parallel interfaces and in the case of SPI4.2 specifically it is sixteen differential data lines each running at ˜700 Mbps. While not trivial to design, SPI4.2 is relatively stable technology and is available in field programmable gate arrays (FPGAs) today. Thus, SPI4-2 is a flexible system-level interface, suitable for point-to-point connections between MACs and network processor units (NPUs), or switch fabric devices for converged systems in LAN/WAN/MAN/SAN environments.
Flow control refers to the techniques to throttle the flow of data to minimize data loss. In a node, the receiving and transmitting integrated circuit or chip may have several functions, one of which is to generate and/or receive the signals through hardware referred to as a PHY. A PHY is the actual transceiver. After the PHY is a media access controller (MAC). For Ethernet, the MAC sublayer is required to perform two main functions: data encapsulation, and media access management. To perform these functions, first-in, first-out (FIFO) queues are created and used in the MAC to store frame data. In an ideal world there would be no data loss resulting from the limitations in network performance or FIFO sizing but in the real world data is lost without a mechanism to limit frame transmission. Data is lost when packet data is received faster than it is transmitted, resulting in filling of the FIFOs. MAC devices either overwrite data in the FIFOs to lose the oldest data or stop writing to the FIFOs and lose the newest data. To avoid data loss, MAC devices must slow down the receive-data stream until the transmit stream has caught up. Flow control is the methodology used to throttle the receive data stream to keep from completely filling the FIFOs. This has created an industry challenge to balance the amount of throughput loss due to flow control versus the amount of data loss without flow control.
In current technology, the flow control mechanism is accomplished within the MAC control sublayer. The FIFO fills as packets are received. Once the FIFO has reached a preprogrammed threshold, the MAC control sublayer signals an internal state machine to transmit a PAUSE frame. This signal informs the link partner to halt transmission for a specified length of time, referred to as “TxOFF” where Tx is an abbreviation for transmit and Rx is an abbreviation for receive. The MAC continues to transmit PAUSE frames with the programmed idle time as long as the threshold has been exceeded. If the FIFO level falls below the threshold prior to the expiration of this time, another PAUSE frame is sent with a zero time specified to re-enable transmission, referred to as “TxON.”
To determine the FIFO threshold, the prior art devices MAC devices have per-port programmable FIFO high and low thresholds. The high threshold is the threshold above which flow control is implemented, and the low threshold is the threshold below which the flow control is terminated. Proper FIFO threshold selection determines the effectiveness of the implemented flow control. To ensure no data is lost, the FIFO threshold should be set low enough to allow for storage of the maximum amount of data that could be received prior to the flow control taking effect. To ensure maximum throughput, the FIFO threshold should be set high enough to not empty the FIFO prior to the flow control being released, and high enough to limit the percentage of time that flow control needs to be activated. Many system constraints must be considered when implementing flow control, such as packet size, duplex mode, link speed, media link segment length and type, the MAC-PHY latency.
To avoid an overflow condition, the FIFO typically stores the amount of data that can be received prior to the flow control taking effect. This amount of data is quantified as the combination of the amount of time for data transmitted from the link partner to reach the MAC receiver after the FIFO threshold is exceeded and corresponds to data currently traveling along the media, i.e., data that has been transmitted but not received. All media has an inherent time delay and the length of the delay is dependent upon the type and length of the media. Also to be considered is the preparation time for the MAC latency to respond to over threshold and send out a PAUSE frame that is dependent upon duplex and supported packet size. There are a number of small time periods to be considered in order calculate the appropriate thresholds: the time to wait between transmissions, the time to transmit the pause packet that needs to be accounted for, the time for the pause packet to reach the link partner, the time to transfer the pause packet through the PHY to the MAC, the time for the receiving MAC to react to the PAUSE frame, and the packet delay in that the receiving MAC could have just started transmission of another packet, which has to be completed before flow control takes effect, etc.
To avoid an underflow condition, the FIFO must have enough stored data to continue transmitting for the time to terminate the flow control. The amount of stored data must exceed the combination of the following: the preparation time to send out the new PAUSE frame with zero time, the time to wait for the current transmission to end, the required time to wait between transmissions, the time to transmit the pause packet that needs to be accounted for, the time for the pause packet to reach the link partner, the time to transfer the pause packet through the PHY to the MAC, the time for the receiving MAC to react to the PAUSE frame, and the delay through the media once the link partner has decided to re-start transmission before the data reaches the other end. For example, for an optical fiber or other media of a distance of two kilometers and a 1030 byte frame, the FIFO depth would be on the order of 10 kilobytes. For a distance of approximately 40 kilometers, a FIFO might have to accommodate one-half megabytes of data. Thus, as distances between nodes increase in a network, the size of this FIFO also gets larger, and the cost of memory becomes prohibitive. There is thus a need in the industry to accommodate larger distances in networks without fear of losing data during the transmission of a PAUSE frame while still maintaining low costs.