The present embodiments relate to operating a program-controlled highly available automation system for a technical process.
A system is considered to be highly available if an application continues to be available in the event of a fault and may be used further without immediate human intervention. A user may perceive no or only a minimal interruption. If an automation device is to fail in the case of a highly available automation system, the system seamlessly switches to the second automation device in order to control the technical process. High availability accordingly signifies the ability of a system to provide unrestricted operation in the event of failure of one of its components.
Both automation devices use the ring for a communication both for data exchange with the peripheral units and also for the exchange of synchronization information. One essential measure in a redundant automation system is the mutual monitoring of the subsystems (e.g., the automation devices) using a watchdog, which identifies, via a timeout, whether the respective other subsystem (e.g., automation device) has failed. A “failover” (e.g., the takeover of the process control by one of the two subsystems in the event of the failure of one of the two subsystems) may be carried out together with internal diagnosis measures.
The shorter the timeout is set for the watchdog, the quicker a “failover” maybe carried out. The minimal timeout to be selected for the watchdog is to take into account the conditions of the communication infrastructure. By way of example, the following embodiment may be assumed for the communication ring: An MRP ring connects the two automation devices and the peripheral units. The MRP ring is configured according to IEC 62439-2. Communication provided via the MRP rings (e.g., multi redundancy protocol) makes it possible for the two automation devices to still communicate with one another via a protocol independent of the MRP ring, and in the process, to use existing ring segments in parallel, if necessary.
In order to cope with the failure of a ring segment (e.g., the failure of a peripheral unit), the MRP ring responds with a ring reconfiguration. With such a ring reconfiguration, no communication between the two automation devices is possible under certain circumstances for time periods of differing lengths. The maximum time period determines the minimal value of the watchdog timeout for the mutual monitoring of the two automation devices. This time period depends inter alia on the devices used in the MRP ring. Since devices in the MRP ring may be replaced during the service life of the system, the length of the communication interruption to be expected may also change. This makes a dynamic adjustment of the timeout necessary or requires inspection of the timeout, in order to promptly identify a possible worsening of the failover times.