Known error-tolerant system architectures comprise at least three processor cores with a divided memory or jointly used memory. In this context, the lockstep mode of the processors is continuously checked by monitoring bus signals. In the text which follows, the lockstep mode is also referred to as synchronous processing of a program or program parts by the processors.
If the active processor fails, the ownership of the memory area and components which are actuated by the active processor via input channels and output channels passes over to another processor. In the lockstep error state (synchronization error) which follows a lockstep error, data access and control processes are removed from the active processor and maintained by another processor. FIGS. 7 and 8 show conventional safety architectures.
The classic minimum configuration of an error-tolerant system, which comprises triple redundancy (TMR: Triple modular redundancy) of processors and of a jointly used memory is still an expensive solution for many safety architectures whose safety concept is based on the use of two redundant processors running in lockstep, or synchronously. However, error tolerance constitutes a particular challenge for processors with double redundancy.
Attempts have been made to assist the error tolerance capability in safety platforms with just two redundant processors. In U.S. Pat. No. 5,915,082 internal busses are provided with parity bits and compared. After a parity error has been detected on one side without the occurrence of a lockstep error, the associated processor is disconnected, with the result that it no longer has any influence on the system. However, the system is switched off after every lockstep error which occurs without a parity error. This procedure which is based on parity checking does not provide sufficient coverage in the cases in which the availability of a redundant system is very desirable after a lockstep error. The parity check can lead, for example, to an incorrect decision if different multi-bit errors are displayed.
US 2006/0107106 describes a method for assisting the availability in a system composed of a plurality of synchronously operating processor pairs. Two redundant processors are combined in each pair. The outputs of the paired processors are continuously compared. If an error occurs in one processor pair, another processor pair will assume the actuation of the system as a boot processor pair. In the meantime, the processor pair which is subject to errors will attempt to recover the synchronization and make itself available as a standby processor pair. This ensures a high level of availability of the system. However, this method is expensive for many embedded systems which, in particular, have to have a high level of availability, as far as possible with a single processor pair. In addition, any recovery of the synchronization of a processor must be subjected to strict safety-related checks in safety-relevant systems.
Against this background there is a need for a safety architecture which has just two redundant processors and which permits a high level of availability of the system.
This object is achieved by means of a two-processor control device as described and claimed herein. Furthermore, the present invention relates to a control method.
Further embodiments, modifications and advantages are described in the following description, drawings and in the claims.
According to one or more embodiments of the present invention, a redundant two-processor control device comprises a first processor and a second processor for the synchronous execution of a control program; at least a first multiplexer for optionally connecting at least a first peripheral unit to be actuated to one of the two processors and at least a first comparison unit for monitoring the synchronization state of the two processors and for detecting a synchronization error. Furthermore, the control device comprises a restoration control unit which is designed to monitor the execution of at least one test program by the two processors after the occurrence of a synchronization error and to evaluate the test results, and which is also designed to configure at least the first multiplexer.
The synchronization unit monitors the synchronous operation, i.e. the lockstep, of the processors. This can be done by comparing the processing of the control program “line by line”, wherein the same results have to occur at the same times. If this is not the case, a lockstep error occurs, i.e. the processors are no longer operating synchronously.
The synchronous processing of the control program is an important feature of redundant systems since in this way it is possible to check whether the currently active processor is operating error-free, in which case it is then assumed that the simultaneous occurrence of the same error of both processors is statistically very improbable. However, if a synchronization error occurs, it is firstly unclear whether the error has occurred at the active processor or at the passive processor. The active processor is understood here to be the processor which actually actuates the peripheral unit. The passive processor is the one which merely runs along synchronously, i.e. it receives the same data and processes the same program steps as the active processor.
When a synchronization error occurs, it is no longer ensured that the control is carried out correctly i.e. there is a risk, in particular in the case of safety-relevant systems such as are used, for example, in the field of automobiles but also in other fields. The control system, for example those shown in FIGS. 7 and 8, must usually be completely switched off.
In the solution proposed here, a restoration control unit is provided which, when a synchronization error occurs, subjects the two processors to a test in order to determine which of the two processors is has an error. After the test and evaluation of the test results, the restoration control unit decides on the further procedure.
If both processors have passed the test, it is assumed that both processors are error-free. In this case, the synchronous execution of the control program is continued.
This solution has the decisive advantage that the actuation of the peripheral unit can be continued while the high safety level is maintained, this is because the two processors have been subjected to a test for freedom from errors. This is a decisive advantage compared to other solutions in which, after the occurrence of a synchronization error (lockstep error), basically complete switching off occurs and the system can only be reset again externally. In this context it is necessary to bear in mind the fact that the mere reset of a system frequently does not constitute a satisfactory solution for safety-relevant applications since no error evaluation is performed, i.e. it remains unknown what has led to the synchronization error. The solution described here therefore offers a way of dealing with synchronization errors and permits the synchronization of two redundant systems to be recovered after a lockstep error.
On the other hand, if a processor has been evaluated as having an error, the control device is reconfigured by the restoration control unit, specifically in such a way that the outputs of the processor with an error are ignored from then on and it is ensured that the peripheral unit can then only be actuated by the error-free processor but not by the processor with an error. This is typically done by reconfiguring the first mmultiplexer with the result that a flow of data is then only possible between the peripheral unit and error-free processor. Furthermore, the reconfiguration leads to a situation in which the comparison unit no longer carries out any monitoring.
This solution has the decisive advantage that the actuation of the peripheral unit can be continued even if this now takes place without redundancy on the processor side. This is a considerable advantage over known solutions in which the control was completely switched off when a synchronization error (lockstep error) occurred. The proposed solution increases the availability of the system here, which is particularly important in the case of critical applications, so that the control over the system can be maintained. The control device can, however, output an error signal in order then to indicate that only “single-processor operation” is then occurring, and then maintenance can take place.
The redundant control device proposed here with means for dealing with a synchronization error can be used in any desired safety-relevant systems. An example in braking applications in the field of automobiles. The control device which is based on only two redundant processors is in this case configured in such a way that it retains the safety level which is present and permits a high level of availability of the system.
The peripheral unit to be actuated can in principle be understood to mean any unit which is accessed by the respective processor. Examples are memories, actuators, input/output units and sensors.
According to one or more embodiments of the present invention, the restoration control unit is designed in such a way as to assign the synchronization error to an error type and to select a test program on the basis of the error type. The error which has occurred is analyzed in order to find out where the error may have occurred or which of the components caused the error. On this basis, a suitable test program is then selected, wherein the test programs and the expected test results are stored in advance, for example in the restoration control unit. If the error, i.e. the difference between the two processor outputs, points to a different memory address, for example a test program can be selected with which memory errors can be detected. This procedure improves the error localization process.
According to one or more embodiments of the present invention, the restoration control unit is designed to configure the first multiplexer on the basis of the test result. The multiplexer, and generally the control device, is therefore configured as a function of the test result. It is possible for the function of the multiplexer to be performed by a bus matrix.
According to one or more embodiments of the present invention, the control device also has at least a second multiplexer for optionally connecting at least one second peripheral unit to be actuated to one of the two processors, wherein the second multiplexer can be configured by means of the restoration control unit. The control device therefore also permits the optional actuation of a plurality of peripheral units while taking into account the safety aspects.
According to one or more embodiments of the present invention, the control device also has at least a second comparison unit for monitoring the synchronization state of the two processors and for detecting a synchronization error. This permits reciprocal monitoring and therefore increases the reliability of the system.
According to one or more embodiments of the present invention, the control device has a first bus matrix which connects the first processor to the first multiplexer, and a second bus matrix which connects the second processor to the second multiplexer.
According to one or more embodiments of the present invention, the first peripheral unit is a common unit which can be optionally actuated by one of the two processors. Furthermore, the control device has at least two further peripheral units, wherein one of the two peripheral units is assigned only to the first processor, and the other of the two peripheral units is assigned only to the second processor as a private peripheral unit which can be accessed only by the respectively assigned processor. A common peripheral unit or component is understood here to be a unit which is actuated redundantly, i.e. the actuation is carried out optionally by one of the two processors, wherein the other serves for comparison. On the other hand, a private unit is actuated by just one of the two processors in each case. The respective other processor has no access to this unit, and has no access to the multiplexer or multiplexers either. The solution presented here permits the restoration of the synchronization between two redundant processors, even while taking into account non-redundant components which are typically implemented in various embedded systems for reasons of cost.
According to one or more embodiments of the present invention, the two further peripheral units are redundant units, i.e. they are physically identical and serve to carry out the same function.
According to one or more embodiments of the present invention, the first and/or the second comparison unit are/is designed to generate a synchronization error signal when a synchronization error occurs. The synchronization error signal may be, for example, an interrupt.
According to one or more embodiments of the present invention, a control method is made available. The control method comprises the synchronous processing of a control program by a first and a second processor which are connected via a multiplexer to at least one peripheral unit to be actuated, wherein just one of the two processors actuates the peripheral unit at a specific time. The synchronous processing of the control program is monitored by a comparison unit. A synchronization error signal is output if the two processors are desynchronized. After a synchronization error signal has been output, the processing of the control program is first interrupted by the two processors. A test is then carried out to check whether one of the two processors is has an error. If both processors are fault-free, the synchronous processing of the control program by the two processors is continued. If, on the other hand, one of the two processors has been detected as having an error, the multiplexer and the comparison unit are configured in such a way that no further communication takes place with the processor with an error and no further monitoring by the comparison unit takes place, and that the error-free processor actuates the peripheral unit. The processing of the control program is continued by the error-free processor. If both processors have errors, the controller is switched off.
According to one or more embodiments of the present invention, the test comprises the simultaneous execution of at least one test program by both processors, wherein a processor is considered to have an error if at least one of the following conditions is met:                the processor has not processed the test program within a first time period T1,        the processor has not successfully processed the test program,        the processor has not gone into the state of rest for a second time period T2 after expiry of the first time period T1.        
This is intended to ensure that not only the correct or incorrect processing is taken into account but also whether the processors have processed the test within a predefined time. The checking of the state of rest serves to determine whether a processor, even though not processing any instructions, nevertheless outputs data. This also indicates an error processor.
According to one or more embodiments of the present invention, the synchronization error is evaluated and is assigned to an error type, wherein, for the checking of the processors, at least one test program is selected as a function of the error type. This permits one or, if appropriate, more error-specific test programs to be selected.
The invention will now be described with reference to specific exemplary embodiments illustrated in the figures. However, said embodiments should not be considered to be restrictive. For a person skilled in the art, the following description provides further modifications which are also to be included in the scope of protection.