1. Field of the Invention
This invention generally relates to data networking technologies and more particularly, to a method and apparatus for striping packets over parallel communication links.
2. Description of the Related Art
Increasing reliability and availability of high speed networks has fueled the growth of many new telecommunications based services. The Internet is one network being used to deliver electronic commerce (e-commerce), telecommuting, and interactive entertainment services around the world. Predicted growth in these network based businesses will continue to outpace the availability of bandwidth current telecommunication vendors can provide.
Telecommunication systems used on the Internet and other worldwide networks typically include local area networks (LANs) coupled to very high speed wide area networks (WANs). Many businesses use LANs because they are low in cost and the 10 Mbps to 100 Mbps bandwidth capacity they provide meets their networking needs. Other businesses, in contrast, set up WANs because they need higher bandwidth capacities and the benefits of high speed communication outweighs the increased costs.
In operation, companies design and configure WANs to operate in many different configurations. WANs can operate at a wide range of bandwidth capabilities ranging from tens of kilobits to gigabits per second. They also can transport variable size packets of data such as generated by different types of LANs.
Synchronous digital hierarchy (SDH) is one networking protocol used to carry data generated by a variety of communication systems including voice, data, and video. Presently, three different versions of SDH exist: SDH-Europe, SDH-Japan, and SONET for North America. These systems are essentially compatible and are referred to collectively as SONET.
Many WANs use SONET because it can accommodate different protocols and bandwidths such as T-1, T-3, and E-1. Network systems implementing SONET can strip the bandwidth off at geographically distant locations with little difficulty. SONET-based networks use add-drop multiplexers (ADM) to distribute high speed data at various geographic locations rather than conventional multiplexes that demultiplex and reagreggate bandwidth at each drop point. This design allows SONET to deliver high-speed data efficiently. Thus, SONET is desirable in video systems, interactive gaming, e-commerce, and other high bandwidth low-latency applications.
High speed SONET networks can transmit data at approximately 10-Gbps per second or IC-192. Essentially, OC-192 is 192 times faster than OC-1 (51.85 Mbps). The SONET and SDH operate at multiples of 51.85 Mbps to allow for efficient conversion from one data rate to the other.
Many companies have technical difficulties implementing high-speed OC-192 networks in practice. For example, OC-192 may not work well in networks with older transmission mediums because of the geometric irregularities or impurities in the transmission medium material. These irregularities or impurities can cause a signal frequency to shift over long distances and, through a phenomenon known as polarization-mode dispersion, introduce noise and distortion on the OC-192 communication link. Moreover, even with new cables, implementing OC-192 may be difficult without developing expensive optical transmitters that operate at very high frequencies and are prone to failure.
To overcome these difficulties, many WANs have achieved OC-192 and higher bandwidth by aggregating multiple lower speed optical or copper channels. These WANs combine many OC-48 channels using a technology known as wave division multiplexing or WDM. On a fiber optic network, WDM takes advantage of the inherent high bandwidth capacity of an optical fiber by transmitting data in parallel at different wavelengths. Lasers emitting different wave lengths allow these different channels to coexist on a shared optical medium. WDM uses different wave lengths to establish a separate sublink between the transmitter-receiver pair. The system receives the WDM transmission with optical receivers sensitive to the different wave lengths used during the transmission. Transmitting information in parallel over multiple sublinks increases the overall capacity on a SONET system.
Many WDM networks connect multiple parallel sublinks to a single communication link at a network junction. Specially designed network interconnect devices, such as routers or switches, pass data back and forth between the networks connected to this junction. These network interconnect devices can take data from the single communication link and distribute it in a predetermined manner over the multiple sublinks. Conversely, the network interconnect devices can also aggregate data from the multiple sublinks into a single data stream for transmission over a single communication link.
Packet-by-packet striping is one method of transferring data from a single link to multiple sublinks. Packet-by-packet striping distributes one packet on a first sublink and subsequent packets on subsequent sublinks. This technique distributes multiple packets over multiple sublinks and transmits the data in parallel. Typically, the first sublink that becomes available carries the packets for data transmission. This uses resources effectively but sends packets out of order and introduces additional processing overhead reordering the packets at the receiver.
Existing systems have had difficulty making packet-by-packet striping operate in a work conserving manner. In a work conserving system, server and network resources do not remain idle and transmit or receive data packets when they are ready in a queue. Unfortunately, systems that send packets in sequence leave some sublinks underutilized waiting to transmit the next sequential packet. Conversely, systems that send packets out of order can cause a receiver to occasionally pause while reordering packets. This pause can delay transmission of data on sublinks downstream from the receiver unit and underutilize these sublinks.
Packets sent out-of-order often require additional resources and storage. Each packet transmitted out-of-order in a packet-by-packet striping scheme has sequencing information associated with each packet. As a result packets may have to be enlarged to hold sequencing information. This can lead to increased buffer sizes and may impact utilization of other network related resources.
One method of performing packet-by-packet striping over multiple parallel channels while maintaining packet ordering was suggested in “A Reliable and Scalable Striping Protocol,” by H. Adiseshu, G. Parulkar, and G. Varghese, ACM SIGCOMM, Volume 26, Number 4, pg. 131-141, October 1996. This packet-by-packet striping technique, known as strIPe, sends packets in sequence without placing explicit sequence numbers in each packet. Like other conventional systems, this technique is also not work conserving and can leave network bandwidth under utilized.
In strIPe, a byte counter associated with each sublink represents the number of bytes a sublink can transmit in a time interval. Sublink initialization sets each byte counter to a positive value corresponding to the sublink's transmission bandwidth. If each of the parallel sublinks has the same bandwidth, the sublink initialization sets each byte counter to the same value. Sublinks with different bandwidth are initialized to different values.
A transmission device sends a packet on a first sublink in the parallel sublink set and subtracts the packet size in bytes from the byte counter associated with the first sublink. When the decremented byte counter indicates a negative number, the transmission device selects a subsequent sublink to transmit packets. Meanwhile, the byte counter associated with the sublink is reinitialized to the original starting value. This process is repeated until all the additional packets are transmitted.
A receiver reverses this process to read packets from the multiple parallel sublinks. Initially, the receiver reads packets off the first sublink. The number of packets the receiver will read off the first sublink depends on the bandwidth of the first sublink and the initialization value used to initialize the first sublink's byte counter. Once the initial group of packets have been read from the first sublink, the receiver reads additional packets from subsequent sublinks in a similar manner.
The strIPe technique maintains packet ordering but is not completely work conserving. A transmitter sends multiple packets over a single sublink until the sublink has reached a predetermined transmission capacity. Meanwhile, other parallel sublinks remain idle. This is not work conserving because some sublinks may remain idle while unsent packets are in the queue ready for transmission.
The strIPe process may also not transmit packets in order if the sublinks transmit data at different data rates. Packets transmitted over a high speed sublink may arrive at the receiver before packets transmitted over a slower low speed sublink. Consequently, it is possible that the receiver will wait for packets on the slower low speed sublink because packets later in the data sequence and transmitted on the faster sublink have already arrived. This is inefficient and not work conserving because the receiving unit must wait for a packet on a slower sublink and delay processing data on the faster sublink.
It is therefore desirable to develop a new technique to distribute data from a single link to multiple parallel sublinks on a network and to aggregate data from multiple parallel sublinks back onto the single link.