1. Field of the Invention
The present invention generally relates to detecting, diagnosing, and/or repairing a failing or failed operating system. More specifically, in an exemplary embodiment, for an OS (Operating System) running in a Logically PARtitioned system, a Service (ambulance) LPAR is created which can gain access to the resources of a failing OS-instance in another LPAR in the same SMP (Symmetric Multi-Processor) server, and can diagnose and fix the errors in the failing OS-instance without affecting the functioning of other LPARs in the SMP.
2. Description of the Related Art
Currently, when an OS-instance fails (e.g. hangs or crashes), the customer has to collect the OS system dump and send it over to the OS technical support team, who will then diagnose the problem using the dump. The problem with this approach is that the process is time-consuming and the OS support team may not have access to all the information of the OS-instance, in which case they will have to go through multiple iterations of system dump collection and analyses. It will be beneficial to both the customers and to the OS-provider (e.g., IBM® for the AIX® OS) if an online analysis of the failing OS-instance can be done, and even better if it can be done automatically.
Moreover, from an information technology (IT) infrastructure management point of view, it is desirable to have server management and maintenance done as automatically as possible. The requirements of a Lights-out data center environment, explained in more detail below, dictate that the amount of human interventions required to maintain a server should be minimal. The above-described conventional procedure for diagnosing a failed OS-instance is human intensive and hence is not conducive to the operation of a highly efficient, effective, and productive data center.
Thus, a need exists to provide a method of automatically detecting a failing OS-instance and, preferably, performing an automatic diagnosis and, even better, performing an automatic repair of the failing OS-instance.