IT system often deals with resources and their availability. In case of an event occurs on a resource, it is important to predict when other resources will be impacted by this event and if they are. Knowing when a resource will be impacted will help to organize the actions to take to avoid the impact on business critical resources.
IT Systems are not anymore monolithic or siloed. With new technologies, such as Service Oriented Architecture, an end-user application uses different services provided by other systems and these services also can use other services and so on. So, connections exist between the end-user applications and services. These services use also other systems, such as web-servers, application-servers, databases and so on. This is another level of connections. These intermediate systems could be impacted by events occurred on other components. This is not only true at the application level components but also at the infrastructure level. The systems such as the above mentioned (end-user applications, services, databases . . . ) need an infrastructure to run. The infrastructure is a composition of servers, network, routers, coolers, power supply . . . . Each of these components is in fact a resource which must be available at a certain moment to respect the business need and these resources are linked to each other because some resources serves others to get them fully operational.
It is also possible that a resource provides a service to another resource in an asynchronous way. For example, if a resource is impacted by an event, it doesn't mean that the connected resources are directly impacted. The connected resources could only need the impacted resource every two hours and thus if the impact is cleared within two hours the connected resource will never be impacted by this event. So, there is the need of accurately knowing when a resource is impacted to avoid impact propagation. This means that an event propagation time exists on connections.
In U.S. 20090177927A1 from Bailey et al. a method and system for determining an impact of a failure of a component for one or more services that the component is supporting is disclosed. A system status of the component identifies whether the component has failed or is active. The one or more services is mapped into a calendar function. After determining that the component supports the one or more services, a lookup in the calendar function is performed to identify a temporal activity and a level of criticality of each service of the one or more services. An impact of the system status of the component on the one or more services is determined from analysis of the identified temporal activity and the identified level of criticality of the one or more services.
High available systems are build with redundant components to avoid that an event impacts the functionality provided by the system. In such systems if one of the redundant components is impacted it would be interesting to know when the functionality supported by the redundant components will be impacted, thus determining how much time leaving to react to avoid this impact.
Several approaches exist to determine a time-line of impacts of a failed resource on one or more dependent resources.
In U.S. Pat. No. 7,092,707 from Lau et al. a system for the prioritization of quality of service (QoS) alerts and the analysis of the impact of such alerts on service uses a service model in which services are broken into one or more service components and sub-components. Creation of a service dependency model, which is driven by different phases of a service, allows to understand how alerts at the lowest level of the network components affect the overall service for which they are but a component.
In U.S. 20090281845A1 issued to the Assignee, a method and system for constructing and exploring key performance indicators (KPI) networks is described to identify KPIs associated with a performance target. Correlated or dependent KPIs are determined and correlations or dependencies are weighed to provide the degree of relevance in the KPI network. Influential chains in the correlation are determined. KPIs and associated correlations may be mined using historical data.
Whereas the prior art provide operational solutions to the cited needs, there is still the lack of a solution to analyse the impact(s) of mutliple resources failure on other resources.
Additionally, still lacking is a solution to provide a list of resources-failure date to show overtime which is/are the next resource(s) in the failure chain and determine the time leaved until the next resource(s) will failed.
The present invention offers a solution to these needs.