The present disclosure relates to the information technology field and more specifically, to the management of computing machines.
The management of computing machines plays a key role in several contexts, especially in large organizations with a high number of computing machines (for example, up to some hundreds of thousands of computing machines). For this purpose, various resource management tools are available for facilitating the management of the computing machines. Endpoint management programs are commercial examples of resource management tools which facilitate the management of the computing machines.
In practical situations, the application of the policies may fail on some of the computing machines. In such an event, a troubleshooting of the failures should be performed in an attempt to solve the corresponding problems. For example, the implementation of security in a cloud computing environment includes commands sent from a grid to an agent executive executed in a virtual machine to check the security, the compliance, and the integrity of the virtual machine processes and data structures. Based on these checked results, additional commands are sent by the grid to the agent executive to correct security, compliance, or integrity problems to prevent security compromises.
However, the troubleshooting is a complex process. Indeed, the troubleshooting at first requires identifying the (alleged) cause of each failure (e.g., utilizing the process of elimination). After determining a solution which is likely to remedy the failure, the solution is executed and verified for correctness. However, this process may be very time consuming and expensive, especially in instances where a high number of failures occurred.
It may be useful to prioritize the failures for their troubleshooting. While prioritizing problems in IT services, an incident cost, a workaround cost, an expected resolution cost, and a total cost for each problem can be determined. A priority may be assigned to each problem such that each priority has an expected resolution time. The priorities are assigned such that the total cost for fixing all the problems is lower than any other selection of priorities
It is quite difficult (and at times impossible) to identify the computing machines. If the application of each policy failed, then the policy may be initially investigated. Since the compliance with the policy is most relevant, an important security patch may be executed thereon. Thus, the solution of the corresponding problems may be significantly delayed by the investigation of other computing machines whose compliance with the policy may be less relevant (and possibly not relevant). As a consequence, some of the computing machines may be left in a critical condition for a relatively long time, along with corresponding risks of the computer machines' integrity (e.g., security exposures).