As point-to-point Internet Protocol (IP) flows increase in bandwidth, core router connections are being driven to higher capacities. Today, core router interfaces are starting to move from 10 Gbps to 40 Gbps while 100 Gbps connections are already in the planning stage. With this increase in capacity comes a heightened responsibility to maintain high availability service by minimizing the time that these very expensive, high bandwidth connections are out of service due to failure events or scheduled maintenance activities.
Referring to FIGS. 1 and 2, a network 10 illustrates core routers 12a, 12b, 12c with mesh connections directly over a statically provisioned wavelength division multiplexed (WDM) transport layer. The core routers 12a,12b,12c can include IP routers with direct optical interfaces, such as 10 Gbps, 40 Gbps, etc. In this example, each core router 12a,12b,12c connects optically to a Wavelength Selective Switch (WSS) 14. The network 10 includes multiple WSSs 14 at various geographically-diverse locations 16 where regeneration of signals may take place or where other signals may be added to or dropped from the WDM line. The WSSs 14 are configured to receive a client signal, such as from the core routers 12a,12b,12c, and to provide a WDM line signal formed by multiplexing multiple client signals, such as with an optical multiplexer with optical filters (not shown). Each node can also include other components (not shown), such as optical amplifiers, dispersion compensation modules (DCM), and the like.
The WSSs 14 connect each of the various locations 16 in a mesh configuration through optical fibers. Conventionally, the core routers 12a,12b,12c are connected through the WSSs 14 statically, and bandwidth on each wavelength connection is typically traffic engineered to a predetermined capacity below the maximum possible capacity, such as 50%, 40%, 20%, etc., so as to accommodate a layer three initiated roll over of traffic from a failed link to a working link upon a network failure, such as a fiber cut, equipment failure on the WSSs 14, failure on the core router 12, and the like. For example, in FIG. 1, the core router 12a is provisioned over the WSSs 14 to connect to the core routers 12b and 12c with a maximum of 50% traffic over each link such that in the event of a failure, sufficient capacity is available to accommodate both working and protected traffic.
In FIG. 2, a failure 18, e.g. a fiber cut, optical transceiver failure, network maintenance event (note, a maintenance event has the same effect as a failure), etc., is illustrated between the core routers 12a and 12b causing the traffic to be interrupted on this link, i.e. 0% fill. Here, traffic from the core routers 12a and 12b is rolled to the links between core routers 12a to 12c and from core routers 12b to 12c at layer three by the core routers 12a,12b, i.e. the routers located at either end of the failed connection. Now, while the link between core routers 12a to 12b is out of service, the network 10 is vulnerable to additional failures on links between core routers 12a to 12c (isolating router 12a) and/or between core routers 12b to 12c (isolating router 12b).
While the architecture of the network 10 is reasonably efficient from a capital equipment (“CAPEX”) perspective, it raises some challenges that can impact the operating expenses (“OPEX”) required to operate and maintain this network 10. One challenge is associated with how the links between the routers 12a,12b,12c are protected. While the link between routers 12a to 12b is down, the network 10 core is operating in a dangerous condition whereby any second failure could potentially isolate a region of the network 10. The Median Time to Repair (MTTR) becomes a critical parameter in the calculation of service availability and the corresponding service level agreements (SLAs) that can be offered to end user clients. Providing the ability to provide a new (third) path through the network 10 in the event of such a condition could help to minimize the MTTR for the connection and maintain high connection availability.
Another challenge is associated with the coordination of network maintenance activities between operations personnel who are responsible for the IP network, i.e. routers 12, and those responsible for the underlying transport connections, i.e. WSSs 14. Because core router 12 interfaces are directly associated with a statically defined WDM lightpath across the network 10, it is not possible to separate the two events. Without careful cooperation between operations personnel, it is possible that simultaneous maintenance could occur on links between routers 12a to 12b (by transport) and between routers 12a to 12c (by IP) causing unnecessary network disruption. Clearly, providing a mechanism to reconfigure the IP or optical layers independently is advantageous.
The use of optical cross connects (OXCs) based on an electrical switch fabric provides one possible solution that could provide optical layer re-configurability in the face of network failure or planned maintenance. However, there is concern that the cost of 40 G or 100 G interfaces required to support core router connections is not as cost effective as 10 G and therefore should be minimized throughout the transmission path, i.e. OXCs would require additional 40 G or 100 G interfaces. Furthermore, dedicating 40 G or 100 G modules on an OXC to aggregate flows of data is wasteful of precious backplane and switch capacity.
Referring to FIGS. 3 and 4, a network 20 illustrates core routers 12d,12e using optical 1+1 broadcast with tail-end protection on optical links between adjacent core routers 12d,12e. Here, the routers 12d,12e are connected through a single optical transceiver on each router 12d,12e. In this example, a unidirectional path is shown from the router 12d to the router 12e. At the router 12d, an optical splitter 22 is configured to split an output from a transceiver on the router 12d into two identical signals with each signal separately provided to a different WSS 14. At the router 12e, a tail end switch 24 is configured to receive outputs from two different WSSs 14. The switch 24 is configured to switch between WSSs 14 responsive to a condition, such as loss of signal.
To date, this kind of protection has been implemented on the short-reach link between core router interfaces and WDM transceivers. This protection scheme is not designed to protect against router 12d,12e or router port failure, however it does provide resistance to transport layer failures associated with optical layer components, i.e. the WSSs 14, and the optical fiber itself.
In FIG. 4, a failure 26 is illustrated on one link between the routers 12e and 12d. Upon failure, protection decisions are made locally, providing rapid (e.g., <50 ms) restoration times and significantly reducing the amount of complex routing table reconfiguration at layer three. In this protection scheme, the router 12d is broadcasting the same data signal to the router 12e over two separate paths through WSSs 14. The splitter 22 is unaffected by the failure 26, however the switch 24 is configured to switch to the backup or protect path upon the failure 26. Also, it is now possible to separate maintenance activities associated with different IP and optical operations teams by moving the physical path of the optical signal can be from the ‘primary’ to ‘backup’ route.
However, because of the static nature of the optical connectivity, it is not possible to reconfigure the optical layer so as to restore diverse links between the end nodes. Instead, after the link failure 26, the connection between the routers 12d and 12e is now unprotected for as long as the damaged link is under repair. For some carriers this is a significant issue. Depending on the physical route of an optical fiber, the MTTR for the damaged connection can be quite long (on the order of days to weeks). For example, some fibers are routed through inhospitable terrain such as over (or through) mountains or under lakes or seas where the maintenance activity can involve lengthy procedures. In this case, the carrier would like to re-establish a new ‘backup’ route quickly so as to restore diversity between end nodes and thus maintain the promise of high availability to end user clients.
Referring to FIGS. 5 and 6, many carriers are now deploying Wavelength Selective Switch (WSS) 14 technology into their networks to form a Reconfigurable Optical Add-Drop Multiplexer (ROADM) 30 node. The WSS 14 (and ROADM 30)) provide all-optical wavelength cross-connection functionality that supports lightpath reconfigurability at the photonic level. The ability to redirect lightpaths from one fiber direction to another direction makes the WSS 14 a promising component of a solution to re-establish a backup connections.
In FIG. 5, the ROADM 30 illustrates lightpath reconfigurability for lightpaths 32 between different WSSs 14. Note, in this configuration, the WSSs 14 may not require connections to transceivers or regenerators, but rather may include amplifiers, DCMs, multiplexers, etc. for all-optical pass-through at the ROADM 30. Here, the lightpaths 32 can be redirected from one output to any of the WSSs 14 (provided there is no wavelength conflict).
In FIG. 6, an IP router 12 and Optical Transport Network (OTN) platform 34 are connected to different WSSs 14. The OTN platform 34 is illustrated as an example, and could also include a SONET/SDH platform, an Ethernet platform, etc. Here, the add/drop traffic from the lightpaths 32 is directly associated with a specific direction (denoted by arrows 36). A shortcoming of ROADM 30 solutions as designed and implemented today is the fact that WDM add/drop traffic is typically associated per direction (or fiber degree) of the ROADM 30 node. The optical interface of the router 12, switch, platform 34, or the like wishing to communicate ‘north’ (i.e., in the direction of arrows 36) from a node must be hard-wired to the WSSs 14 associated with that direction. Therefore, while the pass-through traffic of today's ROADM 30 is highly flexible, the static connectivity of the add/drop traffic limits the flexibility of the end-to-end solution.
Referring to FIG. 7, ROADM 30 nodes are added at intermediate junction nodes to the network 20 of FIGS. 3 and 4 illustrating core routers 12d, 12e using optical 1+1 broadcast with tail-end protection on optical links between adjacent core routers 12d,12e. Adding the ROADM 30 to the core router 12d,12e interconnection example provides optical pass through at intermediate ROADM 30 nodes without the need for manual patching or terminal regeneration. However, because the add/drop function is dedicated on a per-direction basis, the solution with ROADM 30 is no more capable of re-establishing a second backup connection as the static network above.
Referring to FIG. 8, a conventional ROADM 40 illustrates the limitations associated with current directional architectures. The conventional ROADM 40 shows only drop-side connections for illustration purposes, and those of ordinary skill in the art will recognize that the ROADM 40 can also include add-side connections. Connections 42 from receivers 44 are hard wired in the conventional ROADM 40 and the intermediate ROADM 30 nodes in FIG. 7. A receiver's 44 wavelength is fixed by a channel demultiplexer 46 to a single value. Once plugged in, a transmitter cannot change wavelength. As network grows, there may be lots of stranded ports, and wavelengths cannot be dynamically re-optimized to reduce blocking probability. Further, connections to the receivers 44 cannot be automatically altered due to the hard-wire connections 42 preventing rerouting to restore route diversity during failures.
Accordingly, the downtime associated with any network service (e.g., Ethernet, SONET/SDH, Fibre Channel, etc.) is directly associated with the quality of a network service and the associated Service Level Agreement (SLA) between a service provider (i.e., carrier) and a client. In addition to the conventional optical protection schemes described above, network elements (NEs) with protection schemes such as SONET Bi-directional Line Switched Ring/Uni-directional Path Switched Ring (BLSR/UPSR), SDH Multiplex Section-Shared Protection Ring/Sub-network Connection Protection (MSSPRing/SNCP), etc. have been developed as network ‘self-healing’ mechanisms.
More recently, mesh restoration has been implemented in networks to improve service availability by providing access to multiple backup paths. The ability to access more than one backup path through the network increases the probability that service will stay available to the end user, thus decreasing the average downtime a network experiences over the course of a year.
One of the major challenges facing a number of network operators today is a high incidence of fiber cuts occurring randomly in the network. Such failures are a common occurrence in developing nations such as India where significant new infrastructure building is taking place (resulting in lots of digging up of fiber cables). Because most of these carriers are currently using ring protection methods to protect service, their networks are only able to accommodate one fiber failure on a single ring at any one time. High fiber failure probability therefore leads to the isolation of network elements when two cuts occur simultaneously on the same ring and therefore results in a degradation of end-to-end service.
To overcome this challenge, some carriers can geographically partition their existing SONET/SDH ring networks into small cascaded rings such that the probability of two fiber failures occurring on the same ring is reduced. This helps increase service availability but, in many cases, does not allow the carriers to meet their target availability objective (particularly for high value e.g. banking clients who demand ‘always on’ service). They are also now investigating the use of mesh restoration to increase their service availability.
A challenge with both ring protection and mesh restoration is the fact that fiber failures are statistically dependent upon the distance between the switching nodes in the network. So, even if mesh restoration is used, if the distance between mesh restoration switch sites is too long, then there still exists an increased probability that a service node can be isolated due to simultaneous failures on each of the (multiple) links connected to that node . . . thus losing service.
One approach used to increase resiliency performance and availability is to combine SDH or SONET ring protection with SONET or SDH mesh restoration. This capability can be combined through a Virtual Line Switched Ring (VLSR) or SNCP protection plus backup mesh restoration on an Optical switch platform. This clearly provides the benefit of a deterministic 50 ms protection time plus mesh restoration availability.
However, this typically needs to be implemented by a single SDH or SONET vendor. Unfortunately, while perhaps possible, the interaction of SDH/SONET between different vendor's equipment for ring protection is highly complicated and not advised. For a number of reasons including Data Communication Channel (DCC) transparency, different use of (and response to) SONET/SDH overhead bytes, etc. very little success has been achieved in the industry in the area of SONET/SDH inter-working between different vendors. Consequently, it does not make engineering or operational sense to operate SDH/SONET rings between mesh restoration nodes belonging to one vendor and SDH/SONET rings belonging to a second vendor. This has been shown to be operationally challenging to engineer.
Also, in some network designs, it may not be cost effective (or prudent from a traffic management perspective) to put a large cross-connect with mesh restoration capabilities at every node in the network. The use of a limited number of cross-connects plus lower cost equipment in between may provide a more economic solution.