Contemporary communication systems have a plurality of processors which interact with one another to process particular tasks or subtasks. Such a plurality of processors is also called a processor platform. The platform is administratively defined before the communication system is put into operation.
During operation of the communications system, one of the processors in the processor platform accepts the task to be processed, with data required for this purpose, and carries out a first processing operation. According to the result, a further processor is then driven, to which the result of the first processing operation is then supplied. For its part, this processor then carries out further processing operations and transfers the determined result possibly to a further processor. The processing steps of a subsequent processor thus depend directly on the result of the predecessor. This forms a logical chain generally including a plurality of processors in the processor platform. These processors form a subset of all the processors in the processor platform.
The problem with such an arrangement is that, if only one of the processors in this logical chain fails, the task can no longer be processed. In this case, under some circumstances processing of the task cannot even be terminated, because the task is not recognized as being the particular task if data which is essential for this purpose has been lost during the failure. Also another result is that this logical chain of processors remains blocked for the processing of further tasks.
In the case of the prior art, these failures are handled in a cyclical time frame by starting monitoring programs or audits which examine the processors in a processor platform for hardware and software errors. As a rule, these monitoring and checking operations are carried out at a time when there is little traffic. The fundamental time interval can therefore sometimes take up a very long time. The incorrect response thus remains unnoticed for the duration of this time interval.
The publication xe2x80x9cKrishna Kumar R. et al.: xe2x80x9cA Fault Tolerant Multi-Transputer Architecturexe2x80x9d, Microprocessors and Microsystems, vol. 17, No. 2, Jan. 1, 1993, pages 75-81, XP000355542xe2x80x9d talks about a method for improving system availability. The configuration mentioned therein has a central control device. This central control device checks and controls a chain formed by a plurality of processors. If one of the processors fails, the central control device takes the failed processor out of operation using a switching network. The processor adjacent to the failed processor then takes on the tasks of the failed processor. This can be done to this extent because the applications being discussed here contain processor-neutral data which can be processed by each of the processors. To this extent, what is involved here is a rigid configuration that cannot be changed at any time to suit the requirements of the tasks to be computed.
The present invention is based on an object of indicating a way in which the failure of one or more processes in a processor platform can be handled efficiently in order to increase the dynamics of the system.
According to an aspect of the present invention a method for improving system availability after failure of processors in a processor platform includes the steps of processing a prescribed task with one or more of the processors including splitting the prescribed task into one of more subtasks which are each processed on the one or more of the plurality of processors. A first logical chain is formed for a duration of the processing of the prescribed task. Additionally, a second logical chain is formed including all of the plurality of processors in the processor platform. Physical and logical processor data and data describing the current processing state of the prescribed task are transferred from one of the plurality of processors arranged in the second logical chain to a next one of the plurality of processors arranged in the second logical chain. The method further includes loading back at least one of the physical and logical processor data and data describing the current processing state of the prescribed task from the next one of the plurality of processors arranged in the second logical chain to the one of the plurality of processors arranged in the second logical chain when the one of the plurality of processors fails and is restarted.
The particular advantage of the invention is the formation of a further logical chain of processors superimposed on the first logical chain. In this arrangement, significant data from a processor arranged in this chain is transferred to the next processor in this chain. This occurs irrespective of which of the processors in the first logical chain is having the result of the processing transferred to it. This has the associated advantage that, when restarted, a failed processor can load back this significant data directly from the next processor in this chain again and it, thus, has a portrayal of the data as before the failure.
Additional advantages and novel features of the invention will be set forth, in part, in the description that follows and, in part, will become apparent to those skilled in the art upon examination of the following or may be learned by practice of the invention. The advantages of the invention may be realized and attained by means of the instrumentalities and combinations particularly pointed out in the appended claims.