Service composition has become a common practice in business enterprises. A service is a computerized process that mimics an actual real-world physical or business process. A composite service is such a service that is constructed using a number of service components that are arranged and invoked in a way to perform the desired functionality of the composite service. The service components, and thus the composite service itself, are implemented, or effectuated, using underlying physical resources, such as computing devices like servers, and other types of computing hardware.
Because service composition has become a common practice, reliability of composite, or composed, services has become an issue. Reliability analysis has been studied for decades for safety-critical systems, but composite services pose a new challenge. For most safe-critical systems, the hardware and software modules are rigidly integrated and remain unchanged during operation. By contrast, service components of a composite service are often updated and replaced, and their mappings to underlying physical system resource, such as servers, are subjected to reconfiguration. Due to this flexibility, carefully constructing a single tailor-made model for a composite service to determine its reliability is not a viable option.
There currently exist two major technologies for reliability analysis of composite services. They are based on (stochastic) state-space models, as well as on combinatorial models of services. State-space models, such as Markov chains and stochastic Petri nets, represent service components and resources as probabilistic state transition systems, of which the states may reflect their reliability. Given the component and resource models, they can be combined into a larger model representing the composite service that accurately captures the impact of particular failures on the reliability of the entire composite service as a whole. However, this state-based approach often incurs high computational complexity due to state-space explosion.
Combinatorial models, by comparison, which include reliability block diagrams (RBD's) and fault trees (FT's), focus on the causal relations (i.e., reliability-related dependencies) between components and resources. By ruling out possible time-dependent changes of reliability, analyses using these models achieve high computational efficiency at the expense of a potential loss of accuracy. As such, current reliability analyses are plagued by a tradeoff between analysis accuracy and computational complexity.
It is noted that modeling system resources, such as servers, as continuous-time Markov chains (CTMC's) is common. By defining normal and failure states along with transition rates between them, several key metrics can be computed, including resource availability and the mean time to failure/repair (MTTF/MTTR). Recently, to take better account of user/software behavior that affects resource usage, several techniques for hierarchical modeling of software systems that integrate models of user/software behavior and underlying resources have been proposed.
Markov reward models (MRM's) have been considered as a unified basis on which to conduct system dependability analysis. For high-level representations of MRM's, stochastic reward nets, based on the Petri net foundation, have been proposed and employed. Correlation between failures has also been addressed, focusing on failure correlation between successive runs of software and formulating these runs based on the Markov renewal process.
Other prior art has focused on the derivation of stochastic models from high-level services definitions. Although it may be useful to construct stochastic models in such an automated manner, the resulting models may nevertheless still suffer from the accuracy-complexity tradeoff that has been discussed. For all of these reasons, as well as other reasons, there is a need for the present invention.