When a large-scale disaster occurs, there is a possibility of failures occurring at the same time in many components in an information system (hereinafter, a failure that occurs in a component may be referred to as a component failure). In system design for this kind of restoration after disaster, a system designer designs an operation procedure (failure restoration procedure) for restoring a system from failures that occur at the same time in components, in such a way that a requirement for restoration time is satisfied. It is necessary to take the following two points into consideration to design this kind of failure restoration procedure.
Firstly, it needs to consider that the number of combinations of failures, which may occur at the same time in a large number of components, is extremely large. Therefore, evaluating all combinations of component failures, by means of tests in an actual environment, is not realistic. In order to cope with this problem, a model-based approach may be used, in which only values of basic parameters measured in the target system are used, to evaluate design of the failure restoration procedure, based on a model.
Secondly, it needs to consider that it is necessary to fulfill a customer requirement regarding restoration time within a limited budget. For example, in relation to failure restoration of a system, there is a case that prescribed restoration time is guaranteed based on a contract which is agreed with a customer, in advance. On the other hand, a countermeasure to shorten the failure restoration time takes costs. For example, from a viewpoint of a system configuration, the cost of equipment increases when component redundancy by means of hot standby, or the like, is implemented. As another example, from a viewpoint of human resources, personnel cost increases when a skilled system administrator is assigned. With these points, the cost becomes excessive when a system is designed to satisfy a requirement for restoration time about all combinations of component failures. However, a method for cost-effectively selecting a combination, of component failures, that satisfies a requirement for restoration time, is not obvious. From a viewpoint of cost-effectiveness, it is desirable to select a combination of component failures consisting of a minimum number of component failures, which does not satisfy a restoration time or necessary cost requirement, as a target for improvement(s) of the system design.
In order to identify a weak point in system design, for example, there is a known method for specifying minimum cut sets (MCSs) in a fault tree that represents failures (faults) of a system. MCSs are a minimum combination of basic events (for example, component failures) that may cause an undesirable top event (for example, a system failure).
An example of a method for effectively evaluating MCSs of a fault tree is disclosed in PTL 1. According to a technique disclosed in PTL 1, it is possible to reduce the amount of computation and to improve readability, in reliability analysis of a fault tree that includes a majority decision gate.
Note that, in PTL 2, the applicant of the present application discloses a technique for generating an availability model that estimates availability of an information system, when a plurality of operations are executed in accordance with a specific operation procedure.