Many computing infrastructures, such as computing infrastructures implementing on demand business functions, require a high resiliency. In general, resiliency is obtained by detecting a component of a computing infrastructure that has failed or is expected to fail, and restarting another instance of the component. The component can be restarted in the same location in the computing infrastructure or in a different location. However, in a distributed system, there are dependencies among the different components. As a result, merely restarting a single failed component is not sufficient to obtain full fail-over. In particular, other components that are dependent on the failed component also need to be restarted. The collection of all components that need to be restarted together is called a recovery segment.
In order to provide resiliency in operations, recovery segments within the computing infrastructure need to be identified. Subsequently, when recovery is required, the recovery segment can be used to ensure that the recovery is done effectively. For example, a web application may require access to a database in order to generate one or more web pages. If the system needs to be migrated to a different location, the application and the database need to be moved together as a unit. Similarly, in case the database fails, restarting the database server alone will not enable the application to resume since the web application server also needs to be reconfigured to connect to the new instance of the database server. In this case, the application server and the database server together form a recovery segment.
Currently, recovery segments are manually identified. For example, a graph can be generated that includes each managed resource within the computer infrastructure as a node, and a relationship between two resources as a link between the two nodes. An individual may review the graph to manually identify a set of recovery segments for the computer infrastructure. Such manual identification is laborious and error prone.
In view of the foregoing, a need exists to overcome one or more of the deficiencies in the related art.