Use of the computer networks and, in particular the Internet, as a medium for conducting information exchange and business transactions has exploded in the recent past and its unprecedented growth is expected to continue if not increase in the coming years. Unfortunately, the rapid growth and development of the Internet has caused substantial problems with respect to the flow of Internet Protocol (IP) traffic between various nodes on the Internet. Consequently, network architects and administrators have struggled to support and manage the flow of traffic across the network.
In its current form, the Internet's reliability and resiliency to network and nodal failures is poor. Network and nodal failures are caused by a variety of factors such as server failure, client failure, and node failure, however, to network participants it matters little what component failed, the end result is that the current communications session between participants is aborted.
Specifically regarding electronic commercial transactions, any network failure is particularly frustrating in that the buyer is unsure of the particular state the transaction was in when it failed. Accordingly, in order to untangle the present state of the transaction, a substantial amounts of manual intervention is typically required, generally in the form of correspondence with customer service representatives and the like.
In addition to the problems associated with ever-increasing volume of network traffic, additional problems are presented by the format or content of the information being exchanged. For example, the increasing use of graphic and multimedia rich content has inflated the load placed on the network. Also, commercial transactions often require a secure (i.e. encrypted) connection between participants. These transactions require additional processing and network power forming yet another major factor in poor web site performance and/or network failures. Further, performance and scalability mechanisms such as load-balancing software as well as security protocols such as IP Security protocol (IPSEC), Secure Sockets Layer (SSL), and firewalling mechanisms require that more intelligence, at higher levels of the IP protocol stack, be maintained during an association. Accordingly, these higher-level mechanisms effectively act as a proxy between the two endpoints and therefore assume the existence of a robust and error-free communication layer. Because the higher-level mechanisms artificially inform the sending endpoint that the data has been accurately received, prior to this event actually occurring, any subsequent failure on the part of the node goes unrecognized by the sending endpoint and the transaction fails.
In order to combat the effects of failed lower-level network nodes and servers as well as to increase the scalability of the network, load balancers (including both software and hardware-based systems) have been developed which intelligently distribute IP traffic between a group of clustered web servers that collectively host a web site and support its functions. Essentially, the load balancer analyzes the incoming traffic as well as the available servers and determines the most appropriate server to route the traffic to. If one server goes down, the load balancer makes dynamic adjustments in traffic flow, redirecting the user requests to surviving servers, so that the web site stays up and service is maintained at the best possible levels. Referring to secure transactions, a new class of hardware products offload the SSL processing from the web server and restore typical processing rates, resulting in a tremendous boost for e-commerce related sites. These products perform encryption, decryption and transaction processing on their own dedicated hardware platforms without tapping web server resources.
Unfortunately, current state-of-the-industry systems only allow for transfer of relatively static information about a networking association to a secondary node. Some of this information takes the form of network-address information, protocols being used, and perhaps time-synchronization information. However, conventional backup node systems do not dynamically include the information describing a particular networking association, as well as any outstanding traffic, or “session context”. Failure of a node and the associated loss of its session context can result in both a loss of performance as well as a loss of the particular transaction. For example, when load balancers, etc. get involved, they store data on-node. Unfortunately, this data has been acknowledged to the ultimate peers in the exchange (i.e. server or client systems) and is therefore not available for retransmission should that balancer/firewall/etc. fail. What this means is that conventional systems, while preventing multiple successive losses due to failed servers or network nodes, do not address or prevent the loss of service associated with the particular failed transaction. This is because the transaction on the relevant node is not replicated elsewhere. In fact, the very nature of the node itself informs the connected peers that the information thereon has been successfully received by the appropriate party when, in fact, such delivery has not yet taken place. Consequently, the failure of this node results in the permanent loss of the information contained thereon.
One example of the cost of session context loss relates specifically to IP backbone (“border”) routers which connect independent network service providers such as Qwest, Global Crossing, and Dipex and typically route multiple Gigabits of information across a plurality of discrete connections every second. Essentially, border routers are required to maintain routing tables which describe the topology of all or part of the network and are used in making routing decisions. If one of the border routers crashes, the service provider must renegotiate the router with each of its neighbouring routers, thereby re-creating its routing table, before it is allowed to carry network traffic again. Unfortunately, the re-negotiation process is slow and can take an unacceptable length of time (e.g., tens of minutes) for big core routers such as those described above. While the failed router is down, the traffic which would have been delivered across it is re-routed to other nodes causing a significant loss of performance.
One conventional model which does use replicated control and output information has been used in aviation control systems, telephony switches, and other instances. However, in each of these cases, a dedicated server/backup is configured and, in most cases, supported through hardware or other topological organizations dedicated to the configuration.
In view of the foregoing, it would be desirable to provide a technique for preventing information losses due to network node failures which overcomes the above-described inadequacies and shortcomings and results in more robust connections between clients and servers.