Computer systems and network computing operations are increasingly relied upon by individuals, businesses and governments for critical services and business operations. In such systems, network uptime can be critical to the smooth operation of the underlying service or operation, and a network failure must be promptly isolated or restored. Thus, fault isolation and automatic recovery under network failure conditions are crucial requirements for higher bandwidth networks and task-critical networks. In addition, in a typical network failure and recovery scenario, delay on the order of even a few hundred milliseconds can be critical.
In manufacturing or other automation systems, architectures may be decentralized or distributed while delivering performance comparable to centralized systems. For instance, the Advantys STB distributed I/O system is an open, modular input/output system that makes it possible to design islands of automation managed by a master controller via a bus or communication network. The Advantys STB distributed I/O system is a product of Schneider Automation Inc., One High Street, North Andover, Mass. Another problem that can be encountered during a network failure scenario is the inability to access the physical links or devices at the location of the failure.
Often, the island and associated I/O modules may be widely dispersed and may be in isolated locations, or the target systems may be enclosed in other machinery. In these types of network operations, getting physical access to a remote I/O module or network link during a failure situation can be difficult. Furthermore, in networks such as industrial automation systems, reliability is critical. In a factory, for instance, if a network connection goes down, operators could be physically harmed. In these types of network operations, fault recovery must be automatic.
With the increased complexity of industrial automation applications, computer networks in an industrial setting often include numerous devices that are connected over a network such as an EtherNet/IP network. In order to ensure that the devices are able to communicate with each other in a reliable fashion, redundant cabling is often used to provide transmission media. If a transmission medium becomes non-operational (e.g., when a cable is inadvertently removed), a controller may typically require a substantial amount of time to detect the non-operational transmission medium in a real-time control network. Once detected, the controller can reconfigure the transmission path to utilize the redundant cabling. However, during reconfiguration of the transmission path, messages may be lost between devices on the network.
In a typical fault recovery scenario, when a failure occurs, data traffic is rerouted or switched from a current faulty path to a backup path. Depending on the actual redundancy strategy, the standby or backup data path may be dedicated, may require a physical change in connections, or may be a virtual backup path to the active or primary path. Current software methods for providing redundancy in a network require that the devices on the network analyze or discover the entire network to determine a backup path. Rapid Spanning Tree Protocol (RSTP) and Hirschmann HIPER-Ring are two such methods. In both RSTP and Hirschmann HIPER-Ring, the entire network must be discovered before rerouting can be implemented, adding both time and the use of computing resources to fault recovery. In addition, in both RSTP and Hirschmann HIPER-Ring, the network devices implementing the fault recovery must communicate with other network devices on the network.
Thus, there is a real market need to provide a reliable and expeditious approach for providing redundant transmission media in a real-time control network without appreciably disrupting operation.