This invention relates generally to a method and apparatus for switching of data packets in a communications network in a timely manner while providing low switching complexity and performance guarantees.
Circuit-switching networks, which are still the main carrier for real-time traffic, are designed for telephony service and cannot be easily enhanced to support multiple services or carry multimedia traffic. Its almost synchronous byte switching enables circuit-switching networks to transport data streams at constant rates with little delay or jitter. However, since circuit-switching networks allocate resources exclusively for individual connections, they suffer from low utilization under bursty traffic. Moreover, it is difficult to dynamically allocate circuits of widely different capacities, which makes it a challenge to support multimedia traffic. Finally, the almost synchronous byte switching of SONET, which embodies the Synchronous Digital Hierarchy (SDH), requires increasingly more precise clock synchronization as the lines speed increases [John C. Bellamy, xe2x80x9cDigital Network Synchronizationxe2x80x9d, IEEE Communications Magazine, April 1995, pages 70-83].
Packet switching networks like IP (Internet Protocol)-based Internet and Intranets [see, for example, A. Tannebaum, Computer Networks (3rd Ed.) Prentice Hall, 1996] handle bursty data more efficiently than circuit switching, due to their statistical multiplexing of the packet streams. However, current packet switches and routers operate asynchronously and provide xe2x80x9cbest effortxe2x80x9d service only, in which end-to-end delay and jitter are neither guaranteed nor bounded. Furthermore, statistical variations of traffic intensity often lead to congestion that results in excessive delays and loss of packets, thereby significantly reducing the fidelity of real-time streams at their points of reception.
Efforts to define advanced services for both IP and ATM (Asynchronous Transfer Mode) networks have been conducted in two levels: (1) definition of service, and (2) specification of methods for providing different services to different packet streams. The former defines interfaces, data formats, and performance objectives. The latter specifies procedures for processing packets by hosts and switches/routers. The types of services defined for ATM include constant bit rate (CBR), variable bit rate (VBR) and available bit rate (ABR).
The methods for providing different services with packet switching fall under the general title of Quality of Service (QoS). The latest effort in QoS provision over the Internet is carried on by the Differentiated Services (DiffServ) Working Group of the Internet Engineering Task Force (ETF). DiffServ is working on providing QoS on a per-class basis, i.e., each switch provides a different service to packets belonging to different classes. The class to which a packet belongs is identified by a field in the IP packet""s header. The DiffServ Working Group has re-defined the usage of the field originally called Type Of Service and has re-named the field DS (Differentiated Services) byte [K. Nichols, S. Blake, F. Baker, D. Black, xe2x80x9cDefinition of the Differentiated Services Field (DS Field) in the IPv4 and IPv6 Headers,xe2x80x9d IETF Request for Comment RFC 2474, December 1998].
DiffServ relies on (i) a relatively small set of generic Per Hop Behavior (PHB), which define ways for individual switches to perform packet forwarding, and (ii) access control at the boundary of the network. A switch is configured to apply a specific PHB to each service class (i.e., switches are configured with a mapping between DS field value and corresponding PHB). A number of transport services can be built on those PHBs, including premium service, which is expected to deliver packets end-to-end within short delay and with low loss. One approach to an optical network that uses synchronization was introduced in the synchronous optical hypergraph [Y. Ofek, xe2x80x9cThe Topology, Algorithms And Analysis Of A Synchronous Optical Hypergraph Architecturexe2x80x9d, Ph.D. Dissertation, Electrical Engineering Department, University of Illinois at Urbana, Report No. UIUCDCS-R-87 1343, May 1987], which also relates to how to integrate packet telephony using synchronization [Y. Ofek, xe2x80x9cIntegration Of Voice Communication On A Synchronous Optical Hypergraphxe2x80x9d, IEEE INFOCOM""88, 1988]. In the synchronous optical hypergraph, the forwarding is performed over hyper-edges, which are passive optical stars. In [Li et al., xe2x80x9cPseudo-Isochronous Cell Switching In ATM Networksxe2x80x9d, IEEE INFOCOM""94, pp. 428-437, 1994; Li et al., xe2x80x9cTime-Driven Priority: Flow Control For Real-Time Heterogeneous Internetworkingxe2x80x9d, IEEE INFOCOM""96, 1996] the synchronous optical hypergraph idea was applied to networks with an arbitrary topology and with point-to point links. The two papers [Li et al., xe2x80x9cPseudo-Isochronous Cell Switching In ATM Networksxe2x80x9d, IEEE INFOCOM""94, pages 428-437, 1994; Li et al., xe2x80x9cTime-Driven Priority: Flow Control For Real-Time Heterogeneous Internetworkingxe2x80x9d, IEEE INFOCOM""96, 1996] provide an abstract (high level) description of what is called xe2x80x9cRISC-like forwardingxe2x80x9d, in which a packet is forwarded, with little if any details, one hop every time frame in a manner similar to the execution of instructions in a Reduced Instruction Set Computer (RISC) machine.
Q-STM (Quasi-Synchronous Transfer Mode) [N. Kamiyama, C. Ohta, H. Tode, M. Yamamoto, H. Okada, xe2x80x9cQuasi-STM Transmission Method Based on ATM Network,xe2x80x9d IEEE GLOBECOM""94, 1994, pages 1808-1814] uses a frame/subframe/slot structure to regulate the forwarding of ATM cells through the network. However, the authors do not suggest or mention the deployment of a common time reference, or the capability to transport variable size data packet, or the ability to combine xe2x80x9cbest effortxe2x80x9d and variable bit rate (VBR) traffic types.
In U.S. Pat. No. 5,418,779 Yemini et al. disclose a switched network architecture with a time reference. The time reference is used in order to determine the time in which multiplicity of nodes can transmit simultaneously over one predefined routing tree to one destination. At every time instance the multiplicity of nodes are transmitting to a different single destination node. However, the patent does not teach or suggest the synchronization requirements among nodes, or the means in which it can be provided, or the method in which it can be used.
In the context of the Highball Project [D. L. Mills, C. G. Boncelet, J. G. Elias, P. A. Schragger, A. W. Jackson, A. Thyagarajan, xe2x80x9cFinal Report on the Highball Project,xe2x80x9d Technical Report 95-4-1, University of Delaware, April 1995] a network intended for a moderate number of users (10-100) was developed, deployed, and tested. Nodes are synchronized and transmission resources are reserved to flows so that packets always find output links available on every node traversed. No queuing is performed inside nodes; all queuing is done at the periphery of the network. This requires higher accuracy in the synchronization among nodes and affects the robustness of the system.
Architectures for data packet switching have been extensively studied and developed in the past three decades, see for example [A. G. Fraser, xe2x80x9cEarly Experiment with Asynchronous Time Division Networksxe2x80x9d, IEEE Networks, pp. 12-26, January 1993]. Several surveys of packet switching fabric architectures can be found in: [R. Y. Awdeh, H. T. Mouftah, xe2x80x9cSurvey of ATM Switch Architectures,xe2x80x9d Computer Networks and ISDN Systems, No. 27, 1995, pages 1567-1613; E. W. Zegura, xe2x80x9cArchitecture for ATM Switching Systemsxe2x80x9d, IEEE Communications Magazine, February 1993, pages 28-37; A. Pattavina, xe2x80x9cNon-blocking Architecture for ATM Switchingxe2x80x9d, IEEE Communications Magazine, February 1993, pages 37-48; A. R. Jacob, xe2x80x9cA Survey of Fast Packet Switchesxe2x80x9d, Computer Communications Review, January 1990, pages 54-64].
Circuit switches exclusively use time for routing. A time period is divided into smaller time slices, each possibly containing one byte. The absolute position of each time slice within each time period determines where that particular byte is routed.
In accordance with one aspect of the present invention, time-based routing is supported with more complex periodicity in timing than circuit switching provides for. The time frames of the present invention delineate a vastly larger time period than the cycle time (i.e., the time slices) associated with circuit switching. The present invention also supports routing based on packet headers, which circuit switching cannot provide for.
Moreover, the present invention uses Common Time Reference (CTR). The CTR concept is not used in circuit switching (e.g., T1, T3, and the SONET circuit switching: OC-3, OC-12, OC-48, OC-192, and OC-768). Using or not using CTR has far reaching implications when comparing circuit switching and the current invention. For example, CTR ensures deterministic no slip of time slots or time frames, while enabling deterministic pipeline forwarding of time frames. This is in contrast to circuit switching, where (1) there are time slot slips, and (2) deterministic pipeline forwarding is not possible.
Several surveys of switching fabric architectures and interconnection networks can be found in: [G. Broomell, J. R. Heath, xe2x80x9cClassification Categories and Historical Development of Switching fabric Topologies,xe2x80x9d Computing Surveys, Vol. 15, No. 2, June 1983; H. Ahmadi, W. E. Denzel, xe2x80x9cA Survey of Modem High-Performance Switching Techniques,xe2x80x9d IEEE Journal on Selected Areas in Communications, Vol. 7, No. 7, September 1989; T. G. Robertazzi Editor, xe2x80x9cPerformance Evaluation of High Speed Switching Fabrics and Networks,xe2x80x9d IEEE Press, 1992; A. Pattavina, xe2x80x9cSwitching Theoryxe2x80x9d, John Wiley and Sons, 1998].
Optical data communications include single wavelength standards, wherein a single data stream is transduced into a series of pulses of light carried by an optical fiber from source to destination. These pulses of light are generally of a uniform wavelength. This single wavelength vastly under-utilizes the capacity of the optical fiber, which may reasonably carry a large number of signals each at a unique wavelength. Due to the nature of propagation of light signals, the optical fiber can carry multiple wavelengths simultaneously with no degradation of signal, no interference, and no crosstalk imposed by the optical fiber. The process of carrying multiple discrete signals via separate wavelengths of light on the same optical fiber is known in the art as wavelength division multiplexing (WDM). Logically, wavelength division multiplexing may be thought of as equivalent to multiple single wavelength communications conducted in parallel, but the physical implementation does not require multiple optical fibers and therefore realizes cost savings.
The present invention permits a novel combination of time-based routing, which is similar but not identical to circuit switching, combined with data packet forwarding as in packet switching. This combination provides for communication of data via a reserved time frame mechanism, where time frames periods permit communications of a very large number of bytes that are scheduled and switched in a time-based fashion within reserved and scheduled time frames, while simultaneously providing for non-scheduled data packet (NSDP) traffic to be switched and routed via the same WDM (wavelength division multiplexing) optical channels. The non-scheduled data packet (NSDP) traffic can be transmitted during empty portions of an otherwise partially reserved and scheduled time frame period. The non-scheduled traffic can also be routed during fully reserved and scheduled time frame periods that have no scheduled traffic presently associated with them. Finally, NSDPs can be routed during unreserved time frames. The system can decode and be responsive to the control information in the non-scheduled data packet header.
There is a growing disparity between the data transfer speeds and throughput associated with the backbone or core of large networks, which may be in the range of one to tens of gigabits per second, and the data transfer speeds and throughput associated with end-user or node connections, which may be in the range of tens to hundreds of kilobits per second. Switching systems that function efficiently at the slow speeds required by end-user or node connections do not scale linearly or in a cost-effective manner to high speed and high performance variants. Existing circuit switches have additional problems as discussed above, in that with increasing data speeds comes a corresponding requirement for more accurate clocking.
Unlike a circuit switch that might potentially require switching a different route for each byte, the time frame switching in the present invention provides a novel mode of operation where the connection between an input port and an output port is only changed infrequently, such as on a time frame by time frame basis. This mode of operation is an enabling technology to utilize purely optical switching apparatus, as it circumvents the problems typically associated with long switching cycle time.
Moreover, the present invention enables the utilization of very simple interconnection networks such as Banyan Networks [L. R. Goke, G. J. Lipovski, xe2x80x9cBanyan Networks for Partitioning Multiprocessor Systems,xe2x80x9d 1st Annual Symposium on Computer Architecture, December 1973, pages 21-28] whose utilization in other systems may not be advisable due to their blocking features.
The Dynamic Burst Transfer Time-Slot-Base Network (DBTN) [K. Shiomoto, N. Yamanaka, xe2x80x9cDynamic Burst Transfer Time-Slot-Base Network,xe2x80x9d IEEE Communications Magazine, October 1999, pages 88-96] is based on circuit switching. A circuit is created on-the-fly when the first packet of a burst is presented to the network; the first and subsequent packets are transported through the network over such circuit.
Dynarc and Net Insight, two Sweden based companies, commercialize switches for Metropolitan Area Networks (MANs) based on Dynamic synchronous Transfer Mode (DTM) [C. Bohm, P. Lindgren, L. Ramfelt, P. Sjxc3x6din, xe2x80x9cThe DTM Gigabit Network,xe2x80x9d Journal of High Speed Networks, Vol. 3, No. 2, 1994. C.Bohm, M. Hidell, P. Lindgren, L. Ramfelt, P. Sjxc3x6din, xe2x80x9cFast Circuit Switching for the Next Generation of High Performance Networks,xe2x80x9d IEEE Journal on Selected Areas in Communications, Vol. 14, No. 2, pages 298-305, February 1996.] DTM deploys a structure of frames and small slots (64 bits) to perform resource allocation and circuit switching. Slots are allocated to the end-systems according to a predefined distribution; a distributed algorithm based on the deployment of control slots is used to reallocate unused slots.
In accordance with the present invention, a fast switching method is disclosed and is tailored to operate responsive to a global common time such that the switching delay from input to output is known in advance and is minimized in a deterministic way. Consequently, such a switch can be employed in the construction of a backbone network using optical fibers with dense wavelength division multiplexing (DWDM). Such optical fiber links have a transmission rate, with multiple wavelengths, of a few terabits (1012) per second.
The design method disclosed in this invention minimizes the time required for the routing decision and switching of every data packet. Consequently, for a given solid state technology, memory access time and memory word width, this method can support the highest speed optical DWDM links. Moreover, the above is independent of the number of switch ports.
The switching and data packet forwarding method combines the advantages of both circuit and packet switching. It provides for allocation and exclusive use of transmission capacity for predefined connections and for those connections it guarantees loss free transport with low delay and jitter. When predefined connections do not use their allocated resources, other non-reserved data packets can use them without affecting the performance of the predefined connections.
Under the aforementioned prior art methods for providing packet switching services, switches and routers operate asynchronously. The present invention provides real-time services by synchronous methods that utilize a time reference that is common to the switches and possibly end stations comprising a wide area network. The common time reference can be realized by using UTC (Coordinated Universal Time), which is globally available via, for example, GPS (Global Positioning Systemxe2x80x94see, for example: [Peter H. Dana, xe2x80x9cGlobal Positioning System (GPS) Time Dissemination for Real-Time Applicationsxe2x80x9d, Real-Time Systems, 12, pp. 9-40, 1997]. By international agreement, UTC is the same all over the world. UTC is the scientific name for what is commonly called GMT (Greenwich Mean Time), the time at the 0 (root) line of longitude at Greenwich, England. In 1967, an international agreement established the length of a second as the duration of 9,192,631,770 oscillations of the cesium atom. The adoption of the atomic second led to the coordination of clocks around the world and the establishment of UTC in 1972. The Time and Frequency Division of the National Institute of Standards and Technologies (NIST) (see http://www.boulder.nist.gov/timefreq) is responsible for coordinating UTC with the International Bureau of Weights and Measures (BIPM) in Paris.
UTC timing is readily available to individual PCs through GPS cards. For example, TrueTime, Inc. (Santa Rosa, Calif.) offers a product under the trade name PCI-SG, which provides precise time, with zero latency, to computers that have PCI extension slots. Another way by which UTC can be provided over a network is by using the Network Time Protocol (NTP) [D. Mills, xe2x80x9cNetwork Time Protocolxe2x80x9d (version 3) IETF RFC 1305]. However, the clock accuracy of NTP is not adequate for inter-switch coordination, on which this invention is based.
In accordance with the present invention, the synchronization requirements are independent of the physical link transmission speed, while in circuit switching the synchronization becomes more and more difficult as the link speed increases. In accordance with the present invention, routing is not performed only based on timing information: routing can be based also on information contained in the header of data packets. For example, Internet routing can be done using IP addresses or using an IP tag/label when MPLS is deployed.
One embodiment of the present invention utilizes an alignment feature within an input port for aligning incoming data packets to a time frame boundary prior to entry to a switching fabric. This embodiment has the additional benefit of providing for filtering non-reserved traffic from the data packet stream and routing said traffic to a separate routing controller for best effort transport. The system decodes and is responsive to control information in the non-reserved data packet header. The remainder of the traffic represents reserved traffic that is first aligned to a time frame boundary and then routed through the switch fabric on a subsequent time frame, thus preserving the synchronous operation of the system. The present invention also provides means to reintegrate the filtered non-scheduled traffic into idle portions as may coexist within the scheduled traffic streams.
One embodiment of the present invention utilizes a deferred alignment feature, which permits the alignment of incoming data packets to be deferred after preliminary routing and queuing has been performed. This embodiment trades additional storage required for a larger plurality of queues for reduced complexity required in the switch fabric. The switch fabric becomes simpler because it is logically divided into a first portion and a second portion, the first portion of which can be relocated upstream of (i.e., before) the alignment buffer queues. By relocating the first portion to a position before the alignment buffer queues, the first portion of the switch fabric may be implemented as a simple data path expander to fan out the data to a large plurality of queues. The complexity and throughput requirements of each queue are also reduced as the functionality is spread out over a wider number of queues.
A novel control mode is provided by the present invention where a packet header comprises new in-band signal information to establish, maintain, and dis-establish (or destroy) a reserved traffic channel. The system decodes and is responsive to the control information in the data packet header. In this control mode, a specially designated data packet works as a xe2x80x9ctrailblazerxe2x80x9d by signaling to each switch in a plurality of connected switches that it is the first of an expected train of associated data packets. The switches of the present invention respond if able by establishing a reserved data channel, a reserved transfer bandwidth, or by reserving capacity for the traffic associated with and following the specially designated data packet. In an analogous fashion, a terminating data packet signals to each switch in a plurality of connected switches that it is the last of a group or train of associated data packets. The switches of the present invention respond by destroying, reallocating, or reclaiming the data transfer capacity or bandwidth that had been made available to the train of data packets. Interstitial data packets within a train of data packets are marked as such to permit the switches to quickly and easily identify the data packet as one belonging to a scheduled and reserved train of data packets and to the corresponding reserved bandwidth or capacity. Data packets not having the special designations indicated above are treated in the conventional way, where they are generally but not exclusively carried on a best effort basis. Note that the in-band scheduling and reservation of the present novel control mode is independent of but operates concurrently and in cooperation with any other reserved traffic mechanism implemented in the switching systems.
A novel time frame switching fabric control is provided in accordance with an alternate embodiment of the present invention, which stores a predefined sequence of switch fabric configurations, responsive to a high level controller that coordinates multiple switching systems, and applies the stored predefined sequence of switch fabric configurations on a cyclical basis having complex periodicity. The application of the stored predefined switch fabric configurations permits the switches of the present invention to relay data over predefined, scheduled, and/or reserved data channels without the computational overhead of computing those schedules ad infinitum within each switch. This frees the switch computation unit to operate relatively autonomously to handle transient requests for local traffic reservation requests without changing the predefined switch fabric configurations at large, wherein the switch computation unit provides for finding routes for such transient requests by determining how to utilize underused switch bandwidth (i.e., xe2x80x9cholesxe2x80x9d in the predefined usage). The computational requirements of determining a small incremental change to a switch fabric are much less than having to re-compute the entire switch fabric configuration. Further, the bookkeeping operations associated with the incremental changes are significantly less time-consuming to track than tracking the entire state of the switch fabric as it changes over time.