The present invention relates generally to storage area networks, and more particularly to long haul optical protection using mid-span switches in a Storage Area Network (SAN).
Businesses are becoming increasingly reliant on computer networks for mission critical applications. With the emergence of the Internet and the proliferation of global e-business applications, more and more organizations are implementing computing infrastructures specifically designed for reliably accessible data and system availability. Today, even applications such as e-mail have become critical for ongoing business operations.
Faced with increased customer and internal user expectations, organizations are currently striving to achieve the highest availability in their computing systems. Any downtime during mission critical applications can severely impact business operations and cost valuable time, money, and resources. To ensure the highest level of system uptime, organizations are implementing, for example, reliable storage area networks capable of boosting the availability of data for all the users and applications that need it. These organizations typically represent the industries that demand the highest levels of system and data availability, for example, the utilities and telecommunications sector, brokerages and financial service institutions, and a wide variety of service providers.
Developing highly available networks involves identifying specific availability requirements and predicting what potential failures might cause outages. In designing these networks, designers must first understand and define their availability objectives—which can vary widely from one organization to another and even within segments of the same organization. In some environments, no disruption can be tolerated while other environments might be only minimally affected by short outages. As a result, availability is relative to the needs of an application and a function of the frequency of outages (caused by unplanned failures or scheduled maintenance) and the time to recover from such outages.
One of the challenges of building an optical network is building in these availability objectives and characteristics, given the long spans of optical fiber used, for example, in long haul networks. Typically, what is utilized is the construction of multiple diversity routed spans of optical fiber. Despite these redundancy measures and monitoring techniques used, there is no escaping the reality that the frequency of switch-to-protect events (i.e., the switching of data transmission paths due to a failure on one of the paths) increases with increasing transport distance.
Optical networks are mature, robust transport mechanisms for general data applications. With careful attention to network architecture, optical protection switching mechanisms enable the construction of a network with no single point of failure.
However, these protection switches, though infrequent, involve a brief loss of data transmission continuity. In voice or general data applications, this has been generally acceptable. In more recent data applications, such as high speed optical networks used with mission-critical applications, these brief, infrequent protection switching events may bring about a halt in the application, and possibly require lengthy data resynchronization activity before the application is restarted.
Although connectionless packet transport networks are less sensitive to brief interruptions in transport continuity due to sophisticated routing mechanisms, they remain a source for network failure. Connectionless transport can potentially have large, unavoidable variations in latency. These same applications that are sensitive to data transport continuity are also sensitive to latency variations.
In implementing long haul high speed networks, network designers now consider network availability of primary importance over the costs associated with the implementation and operation of the network. For high volume networks, any downtime may mean the loss of millions of dollars.
To achieve these very high levels of performance in a high speed network requires a combination of a low failure rate and a very short recovery time whenever a failure occurs. For the most part, current protection and disaster recovery schemes make use of physical redundancy and an array of robust software-based recovery mechanisms. Physical redundancy has traditionally been achieved by provisioning redundant backup subsystems having substantially the same network elements as the primary network. In effect, the primary networks are mirrored in the backup subsystem. In the event of a network failure, network elements such as switches and routers provide alternate and diverse routes on a real-time or predetermined basis. In tandem, software-based recovery schemes complement physical redundancy in minimizing the impact of interrupted customer traffic. Recovery software enhances network availability by automating the recovery process so as to ensure the fastest failover possible. At times, failovers may occur so quickly that they appear transparent to the customer.
There are several high availability strategies in use today. Among these strategies are protective and restoration schemes based on centralized or distributed execution mechanisms, the priority of data, the network layer in which a failure occurs, link or node failures and real-time or pre-computed failure responses. In one protective strategy, backup resources are allocated on a one-for-one basis in advance of any network failure and regardless of the added expense or the inefficient use of available resources. In another protective strategy, available and previously unassigned resources are immediately allocated and used on a real-time or on a substantially real-time basis, at the expense of recovery speed.
Dedicated use of network resources is a protective scheme currently used in network management. In the dedicated protective strategy, certain network resources are dedicated as backup network elements for use upon the failure of the primary communications channel. Backup resources such as backup switches, routers, servers, controllers, interfaces, drives, and links are dedicated as backup to the primary network elements. In the early development of the networking industry, this strategy was referred to as a “hot standby” mode of operation. Upon the detection of a failure of a network element, its corresponding backup network elements were immediately placed in operation. In the event of a failure, data being transmitted on the primary pathway is alternately routed through the backup pathway. In this protective approach to network availability, the backup pathway remains idle, but is immediately made available to data on the primary pathway. As readily apparent, the provisioning of a fully redundant and diverse route adds considerable expense to the installation and operation of the high speed network. Moreover, the physical switching of pathways may result in a disruption long enough to bring down a system.
In the optical networking industry, storage area networks (SANs) have used these same protective strategies, with less than acceptable availability performance. A SAN is a network whose primary purpose is the transfer of data between and among computer systems and storage elements. A SAN consists of a communication infrastructure, which provides physical connections, and a management layer, which organizes the connections, storage elements, and computer systems so that data transfer is secure and data is highly available. A major advantage of SANs is the ability to provide any-to-any connectivity between the storage devices and remote computers. This means that multiple computer systems can share a storage device so as to allow for the consolidation of storage devices into one or a few centrally managed platforms. SANs employ fibre channel technology to provide 100 Mbs or better data transfer speeds which is significantly faster than today's Small Computer System Interface (SCSI) (i.e., a parallel interface enabling computers to communicate with peripheral hardware such as printers). At these speeds, SANs are used to perform backup and recovery functions, such as data replication, clustering, and mirroring. However these functions are quite sensitive to data disruption and may also be susceptible to the briefest of network failures.
The disruption frequency increases as the length at which the data is being transported increases. The time needed to synchronize the two endpoints (i.e., the source and destination) after a failure on a high volume data channel occurs can be hours.
Also, the amount of fiber needed for a long haul circuit can be extremely expensive in terms of materials and the labor associated with putting the fiber down. As a result, replication of one or more data channels (i.e., fibers) for use as backup is often an expensive undertaking.
Thus, there remains a need to provide an optical network that minimizes data disruption and also reduces the amount of fiber needed for backup data channels when transporting data over long distances.