This invention relates generally to computer networks, and more particularly, relates to detecting and recovering from faults associated with a computer network used for purposes of process control.
As computing devices become faster and more reliable, they are increasingly being employed to control critical processes, improving productivity and reducing the risk of human error. However, computing devices are not error-free, and especially with the multitude of interconnections present in a computer network, the risk of loss of process control is ever present. The wires and connections that are located externally to the network stations, devices, and computers have generally proven to be the most vulnerable to breakage, shorting, misconnection, disconnection and so on. There is also a possibility of port-related faults, switch related faults, or other faults which derive from the network itself.
For some processes that utilize a computer network to facilitate process control between computing devices, a loss of connectivity or signal integrity can be costly. Such an error will likely cause the process to run incorrectly, wasting materials and requiring extensive human intervention to restart and stabilize the process. For other more critical processes, such an error could expose process personnel to injury or even death.
In order to continue to reap the substantial benefits conferred by automated process control, while not suffering the detriments resulting from a loss of network integrity, others have sought to provide means for reducing the risk of network failure between any two nodes on a control network. Such means include, generally, improved network configurations, protocols, and integrity verification mechanisms.
One particular solution to this problem has been the use of fault tolerant connections to the network of interest. Thus, for example, a single computer or workstation may be connected to a single network via primary and alternate connections. In the case of a fault in the primary connection to the network, such a system would automatically shift routes such that routine communications to and from the affected computer are by the alternate rather than primary route. Traditionally, such systems lack the ability to quickly recover from a fault at the necessary time, in order to prevent process disruption. This is caused in part by the possibility with some such systems that the alternate connection may fail without notification to the connected computer. Thus, at the time that the primary connection fails and an attempt is made to route communications through the alternate connection, the latent failure associated with the alternate connection is belatedly detected, preventing timely recovery. An even greater disadvantage of systems which use only redundant connectivity to provide fault recovery is that they do not allow recovery from a network fault that occurs on the network itself rather than on the connections between the computer and the network.
Systems that use simultaneous, rather than alternate, communications over redundant network connections, or that otherwise periodically verify the integrity of the alternate link, may more quickly discover, and hence recover from, any fault in one of the connections. However, such systems still suffer the latter deficiency, namely the inability to recover from network faults within the network itself.
Occasionally, existing dual network fault recovery schemes are employed to overcome some of the aforementioned deficiencies. However, these specialized topologies which utilize connection to two separate networks to effect fault recovery are still deficient as currently implemented in that they often do not provide timely notice of network or connection faults, sometimes making recovery impossible or time consuming. In addition, such existing systems require specialized proprietary network connection equipment, rather than off-the-shelf components. This may increase the system cost and complexity. Further, this may inhibit the ability of a single computer to access another ordinary network such as a corporate LAN in addition to the redundant networks using standard network interface hardware, such as a standard Ethernet card, Token Ring card, or other network card using a standard networking protocol. A redundant network fault recovery system is needed which provides timely network fault detection and recovery, while using non-proprietary hardware to connect computers to the network and allowing access to the redundant networks as well as other networks via standard network interfaces.
In accordance with these needs, there is provided in the present invention a redundant network architecture wherein faults related to a network and associated ports are dynamically detected via a heartbeat pinging mechanism and other mechanisms. Furthermore, when a fault is detected by whatever mechanism, network traffic is automatically rerouted by manipulation of local and remote routing tables or port look-up tables to effect rapid fault recovery.
Additional features and advantages of the invention will be made apparent from the following detailed description of illustrative embodiments, which proceeds with reference to the accompanying figures.