1. Field of the Invention
The present invention relates to a processing device, a failure recovery method therefor, and a failure restoration method and, more particularly, to a technique about a countermeasure against a failure of a processing device in which a processor is connected to a plurality of devices through a plurality of slots.
2. Description of the Related Art
In recent years, although improvement of a clock frequency of a processor hits its peak due to a problem of a power consumption or limitation of micropatterning, since a new technique such as an on-chip multiprocessor or a multi-spread processor is researched and developed, the capability of 1-chip processor process is dramatically improved. Accordingly, an operation environment of software (SW) has varied. For example, the number of multiprocess and the number of multithreaded processes increase. With the increase in number of processes described above, unrealized processing can be made possible, and research and development are being actively performed now.
Accordingly, a countermeasure against a failure of a computer system using a processor having improved processing capability becomes more important. For example, in a conventional countermeasure against a failure, in preparation for occurrence of a failure in a card loaded in a SLOT or a path under control of the card, a method of multiplexing SLOTs is used. As conventional technique documents about countermeasures against failures, the following documents are known.
For example, a multithread processor (for example, see JP-A-2002-123402 and JP-A-2002-108630) which reinputs a thread when one of memory elements arranged in units of threads is broken down is known. In addition, a print automatic restoring apparatus (for example, see JP-A-2004-178124), a failure allowable architecture for incircuit programming (for example, see JP-A-2004-164671), a computer automatic switching method (for example, see JP-B2-2773424), and an inter-task communication failure processing method (for example, see JP-A-04-057135) are also known. An active/spare switching control method for an information processing system (for example, see JP-A-62-106564) a communication apparatus which performs automatic failure restoration and an automatic failure restoration method (for example, see JP-A-2002-330131), an inter-apparatus connection method and a multiplexing apparatus, and a switching apparatus for a multiplexing apparatus (for example, see JP-A-2002-077186) are also known. A domain region dynamic arrangement method for a virtual computer (for example, see JP-A-01-017128), a dynamic LAN control apparatus switching method (for example, JP-A-09-167128), a high-speed network address takeover method, and a network apparatus and program (for example, see JP-A-2005-136690) are also known. Furthermore, a method of keeping an optimum system availability by resource restoration (for example, JP-A-2002-132697), a broken processor replacing method, medium, and system (for example, see JP-A-2004-318885) are still also known.
However, in order to realize multiplexing in the conventional technique, parts such as a SLOT and a card must be prepared to disadvantageously increase costs. In occurrence of a failure, a maintenance person must specify a broken part and replace the broken part with a new part within limited time. For this reason, recovery operation time must be disadvantageously long.