This invention relates to telecommunications networks. More particularly, this invention relates to an improved network architecture for more effectively and efficiently recovering from failures.
A telecommunications network transports information from a source to a destination. The source and destination may be in close proximity, such as in an office environment, or thousands of miles apart, such as in a long-distance telephone system. The information, which may be, for example, computer data, voice transmissions, or video programming, is known as traffic, usually enters and leaves a network at nodes, and is transported through the network via links and nodes. The overall traffic comprises multiple data streams which may be combined in various ways and sent on common links. Generally, a data stream is a flow of data or information and may comprise multiple component data streams.
Nodes, sometimes termed offices, are devices or structures that direct traffic into, out of, and through the network. They can be implemented electronically, mechanically, optically, or in combinations thereof, and are known in the art. Links connect nodes and transmit data between nodes. A path between any two nodes is a route allowing for data transmission between those two nodes; a path may be one link, or may be comprised of multiple links and nodes and other network elements.
Nodes range in complexity from simple switching or relay devices to entire buildings containing thousands of devices and controls. Nodes can be completely controlled by a central network controller or can be programmed with varying degrees of automated traffic-managing capabilities.
Links are typically either coaxial cable or fiber-optic cable, but can be any transmission medium capable of transporting traffic. Individual links can vary in length from a few feet to hundreds of miles. A link can become inoperative in a number of ways, but most often becomes inoperative as a result of being cut. This may occur, for example, when excavation severs an underground link, or when an automobile accident or storm damages a utility pole carrying a link.
The volume of traffic transported by a network can be significant. Transfer rates for a fiber-optic link may be 20 gigabits per second or more. A gigabit is a billion bits, and a bit is a binary digit (a logical 1 or 0), which is the basic unit of digitized data. Digitized data is a coded sequence of bits, and traffic is typically transported in that form. Data such as audio telephone conversations may be digitally encoded and then transmitted.
Traffic in networks carrying digital data is often circuit switchedxe2x80x94for each transmission between two points, a circuit or channel following a path is set up for that traffic. Traffic on a particular circuit in such networks is often sent in one direction only. Thus traffic requiring information to be both sent and received at the same timexe2x80x94for example a telephone conversation, which requires each participant to be able to talk and thus send audio information at the same timexe2x80x94requires two circuits or channels to be established. The two circuits originate and end at the same two points, but may take different paths. Traffic flow through links may be bi-directional, that is, some traffic may flow upstream through a link while other traffic may flow downstream through the same link simultaneously.
Because of the significant volume of traffic typically transported by a network, any disruption in traffic flow can be devastating. Of particular concern are telephone networks, which can transport thousands of individual communications simultaneously. Thus the ability to quickly restore network service should a portion of the network become inoperative is of high priority. Moreover, to ensure that the network is implemented and managed in a cost-effective manner, proper allocation of resources such as link equipment, processing equipment, multiplexers and cross-connects is also of high priority.
Data is typically transmitted and routed at certain standard levels. For example, one two-way phone conversation requires 64K bits/sec to be transmitted in each direction; this rate is termed DS0. A T1 link carrying a DS1 signal may transmit approximately 1.5 M bits/sec, the data of 24 DS0 circuits. Thus 24 DS0 channels may be combined by a multiplexing device and transmitted as one DS1 channel. A T3 link may transmit the data of 28 T1 links, an OC1 link carries approximately the same amount of data as a T3 link, an OC3 link may transmit the data of 3 OC1 links, an OC12 link may transmit the data of 12 OC1 links, and an OC48 link may transmit the data of 48 OC1 links, or approximately 2.5 gigabits per second. Different types of multiplexers are used to add or remove different sized bundles of traffic from larger bundles of traffic. For instance, a digital access cross-connect system (xe2x80x9cDACSxe2x80x9d) may be used to add (multiplex) or drop (demultiplex) a DS1 channel to or from a DS3 channel.
When used herein, multiplexing is meant to include demultiplexing, and multiplexer is meant to include a device having demultiplexing capabilities. Equipment which adds or drops traffic to or from a link may be called termination equipment.
Fiber optic lines transmit data using light, and multiple wavelengths of light may be transmitted on one fiber optic line as separate channels. Typically, one wavelength of light carries one OC48 link in one direction, and a fiber optic line may carry 8 wavelengths. Thus one fiber optic line may carry 250,000 one way telephone conversations simultaneously.
Data is transmitted, and is added or removed (xe2x80x9cdroppedxe2x80x9d) from a data stream, in certain standard units. It is more efficient to transmit, route, add or drop data in larger rather than smaller units. Thus traffic is bundled into the largest unit possible. The size of a bundle, channel or data stream used to transmit data may be termed its granularityxe2x80x94channels of higher capacity have higher granularity.
An add/drop multiplexer (xe2x80x9cADMxe2x80x9d) may be used to add or remove a wavelength of light from a link. At each node one ADM is required for add/drop capability for each of the multiple wavelengths that may be carried on a fiber optic cable. Multiplexers with the capability to perform add/drop operations on data flow sizes other than wavelengths may be used at nodes. Cross-connects may be used at nodes to switch traffic from one link to another link.
Network architecture (the manner in which nodes and links are configured and traffic is controlled) plays a significant role in both the cost-effective implementation and management of a network and the ability of a network to quickly recover from traffic flow disruptions.
Depending on the configuration of a network and its traffic routing, each node does not require an ADM for all wavelengths that may be carried on a link. If it is determined that a node does not have to access or route traffic on a certain wavelength or channel, or does not need to route traffic among multiple links, that node does not need extra multiplexers or cross-connects. Traffic which may be termed xe2x80x9cexpressxe2x80x9d traffic may pass through a node without being demultiplexed or routed by that node.
In one known network, a central controller monitors and controls traffic flow throughout the network, which is organized as a mesh. Complex traffic routing and recovery algorithms are used to manage traffic flow. FIG. 1 is a diagram illustrating a simplified portion of a known mesh network. Mesh network 300 comprises nodes (e.g., nodes 304, 306, 308 and 310) connected by links (e.g., links 305, 307, 311, 312, 314 and 316). Each node in network 300 communicates with controller 302, sending status information and receiving instructions for properly routing traffic. Nodes may communicate with controller 302 via satellite (not shown), by a land link separate from links carrying traffic (not shown), by links carrying traffic, or by other methods. Each node is interconnected with other nodes by links. For example, nodes 304 and 306 are connected by link 305. Links such as links 316 and 314 connect the portion of network 300 shown in FIG. 1 to other portions of network 300. For clarity, not all nodes and links in FIG. 1 are identified with reference numerals.
When a link becomes inoperative, the nodes connected to the link notify controller 302. Controller 302 then determines if an alternative traffic path can be configured and sends messages to certain nodes to route or reroute the traffic. When used herein, xe2x80x9croutexe2x80x9d and xe2x80x9creroutexe2x80x9d refer to setting or altering the path traffic takes. Traffic may be routed on xe2x80x9cworkingxe2x80x9d links, which carry network traffic during normal operation and which are typically given excess (or xe2x80x9cprotectionxe2x80x9d) capacity for use in response to system failures. Typically, a certain percentage of the capacity of each link, for example 50%, is set aside and is not used during normal operation, but is used to route excess capacity during an error condition. Traffic may also be routed on protection links, dedicated links used only to handle rerouted traffic during an error condition or during an excess capacity condition. Since protection links duplicate working links, they may provide 100% excess capacity where they exist.
An error condition is any condition or occurrence that adversely affects the performance of the network or interrupts network flow. For example, an error condition may be the failure of a link or an overload condition.
For example, if in network 300 link 305 should fail, the status of this failure is transmitted to controller 302 by, for example, node 304, node 306, or both. Controller 302 directs that traffic sent between links 304 and 306 be sent along an alternate path; for example via nodes 308 and 310 and links 307, 309 and 311. To effect this change, controller 302 must communicate rerouting instructions to nodes 304, 306, 308 and 310; these nodes must have the capacity to communicate with controller 302 and to manipulate and route the traffic.
In a mesh network such as network 300, typical recovery time from a disruption is on the order of seconds or minutes; quicker recovery times are desirable. In addition, a large amount of extra routing equipment is required in mesh network 300: since each node may be called upon to participate in error recovery, each node must carry routing equipment for this task. Error recovery is typically carried out at a relatively low (i.e., inefficient) granularity or channel size. More protection capacity is required, as protection capacity may not be used efficiently.
In a mesh network, traffic is commonly sent between two nodes via other nodes and links. ADMs and other multiplexers are required at a node only if traffic is to be added or dropped from a link or if the destination of traffic is to be altered depending on changing circumstances. Typically, nodes may add, drop and route traffic which originates or terminates at that node (local traffic) or traffic which does not originate or terminate at that node (express traffic). Larger and more complex multiplexing and cross-linking devices, and more of such devices, are needed if a node is to be able to route express traffic and traffic rerouted as a result of an error. If a bundle of trafficxe2x80x94for example, a wavelengthxe2x80x94is sent via a node without having traffic added to or dropped from the bundle, and without the node having the capability to change the destination of the traffic, extra or larger equipment (such as multiplexers or cross-connects) is not required at that node. A bundling or routing scheme which allows wavelengths to bypass intermediate nodes, and which does not require certain nodes to route express traffic, results in significant savings.
To improve recovery times, other known networks have decentralized node control. In these networks, individual nodes, in cooperation with adjacent nodes, routinely route traffic and respond to path failures without significant interaction with a central controller. By communicating locally among themselves, these nodes can, for example, recover from path failures by configuring alternative paths and rerouting traffic to those alternative paths. Existing decentralized node control schemes may improve recovery times to the millisecond range (thousandths of a second), but may result in significant costs. Existing decentralized node control may require a great deal of inter-nodal communication and coordination, which must be supported with increased link capacity and more complex nodes. Each node capable of rerouting must be able to communicate and analyze traffic management communications, and must support expensive routing hardware.
In addition to the extra equipment required for error recovery, existing mesh networks require a certain amount of excess routing equipment (e.g., multiplexers and cross-connects) and excess link capacity for normal operations. Routing and provisioning (re-routing in response to load changes) take place at all nodes. Thus each node requires excess multiplexing and cross-connect equipment even during normal operations.
Networks employing architectures other than mesh configurations are known. Ring networks, for example, interconnect nodes in a circular fashion to form rings. The rings are then interconnected to form a complete network. Each node is connected to its neighboring nodes by a working link and a protection link. In the event that a link between two nodes is severed, the nodes route traffic using the protection links. One known ring network has typical recovery times of less than 50 milliseconds.
FIG. 2 is a diagram illustrating a simplified portion of a known ring network. Network 600 includes nodes 610, 620, 630, 640 and 650. Nodes are connected by working links, indicated by solid lines (such as working link 660), and protection links, indicated by dashed lines (such as protection link 670). For clarity, the working and protection links existing between only one pair of nodes are identified with reference numerals in FIG. 2.
Network 600 recovers from link failure generally as follows: assume the working and protection links between nodes 610 and 620 are cut. Nodes 610 and 620 communicate with each other to transmit disrupted traffic via protection links and via nodes 630, 640 and 650. Recovery traffic is sent on protection links because the capacity of working links is used by normal traffic. Traffic flow is thus restored between nodes 610 and 620 by rerouting disrupted traffic back around the ring through protection links. A network may comprise numerous interconnected rings.
A disadvantage of this ring network is that the ring can recover from only one link failure; more than one link failure requires physical repair to the network to recover traffic flow. This disadvantage is not shared by mesh networks because of their high inter-connectivity. A further disadvantage is the high percentage (100%) of link capacity used for protection, which requires a large resource outlay.
In view of the foregoing, it would be desirable to provide a network architecture for a telecommunications network that provides high levels of restorative capability in a manner which is more cost-effective than existing systems. It would be desirable to provide such a network which requires a lower amount of redundant protection capacity and a smaller amount of routing equipment for error recovery and also during normal operations. It would also be desirable to provide a network architecture that provides fast decentralized restoration ability requiring less inter-nodal communication. It would be still further desirable to provide a network architecture that operates with less complex traffic routing and recovery algorithms.
The present invention is a telecommunications network having a hierarchical architecture which reduces the amount of equipment and processing required to recover from network failures and to route traffic during normal, non-error operations. A hierarchical architecture is one which divides the network into classes or categories of nodes.
In an exemplary embodiment, the nodes of the network are divided into two classes, high-level nodes (L2 nodes) and low-level nodes (L1 nodes). High granularity traffic is collected, routed and manipulated at L2 nodes but generally passes through L1 nodes, which generally lack the capability for routing such high granularity traffic. Each L1 node may be capable of multiplexing and routing low-level traffic originating from or terminating at the L1 node itself or neighboring L1 nodes. Equipment savings results from L1 nodes lacking the capability to manipulate traffic other than traffic relevant to those nodes or a small number of nearby nodes.
Each L2 node pair is connected by at least three node disjoint paths of L1 nodes and links, where each node disjoint path is comprised of a set of L1 nodes distinct from any other node disjoint path. When a failure occurs on one of the node disjoint paths, the L2 node pair bracketing the path routes some traffic formerly using that path onto the remaining two paths. Less protection capacity is needed, as each node disjoint path is expected to handle only a portion of rerouted traffic in the event of a network equipment failure.
Recovery is decentralized, as recovery decisions are made at the L2 nodes near the error condition, rather than at a central controller. Recovery is thus faster than with a network using centralized recovery processing, requires less equipment, and is less susceptible to the failure of a centralized controller.