1. Field of the Invention
The present invention generally relates to computer system platforms and, more particularly, to the preservation of error data on a platform that does not have direct access to non-volatile storage.
2. Description of the Related Art
In a computing environment, parallel processing generally refers to performing multiple computing tasks in parallel. Traditionally, parallel processing required multiple computer systems, with the resources of each computer system dedicated to a specific task, or allocated to perform a portion of a common task. However, recent advances in computer hardware and software technologies have resulted in single computer systems capable of highly complex parallel processing, through the use of multiple processors.
In some cases, a multi-processor system is logically partitioned, with one or more of the processors dedicated to, or shared among, each of several logical partitions. In a logically partitioned computer system, available system resources, such as the processors, volatile memory (i.e., memory not maintained in the absence of power), and various I/O devices, are allocated among multiple logical partitions, each designed to appear to operate independently of the other. Management of the allocation of resources among logical partitions is typically accomplished via a layer of system firmware, commonly referred to as a partition manager.
An objective of the partition manager is to allow each logical partition to independently run software (e.g., operating systems and operating system-specific applications), typically developed to run on a dedicated computer system, with little or no modification. For example, one logical partition may be running a first operating system, such as IBM's OS/400, a second logical partition may be running a second operating system, such as IBM's AIX, while a third logical partition may be running a third operating system, such as Linux. By providing the ability to run multiple operating systems on the same computer system, a logically partitioned system may provide a user with a greater degree of freedom in choosing application programs best suited to the user's needs with little or no regard to the operating system for which an application program was written.
Logical partitioning of a large computer system has several potential advantages. For example, a logically partitioned computer system is flexible in that reconfiguration and re-allocation of resources may be easily accomplished without changing hardware. Logical partitioning also isolates tasks or groups of tasks, which may help prevent any one task or group of tasks from monopolizing system resources. Logical partitioning may also facilitate the regulation of resources provided to particular users, which may be integral to a business model in which the computer system is owned by a service provider who provides computer services to different users on a fee-per-resource-used or “capacity-on-demand” basis. Further, as described above, logical partitioning makes it possible for a single computer system to concurrently support multiple operating systems, since each logical partition can be executing in a different operating system.
Additional background information regarding logical partitioning can be found in the following commonly owned patents and patent applications, which are herein incorporated by reference: Ser. No. 09/672,043, filed Sep. 29, 2000, entitled “Technique for Configuring Processors in System With Logical Partitions”; Ser. No. 09/346,206, filed Jul. 1, 1999, entitled “Apparatus for Supporting a Logically Partitioned Computer System”; U.S. Pat. No. 6,467,007, entitled “Processor Reset Generated Via Memory Access Interrupt”; U.S. Pat. No. 5,659,786, entitled “System And Method For Dynamically Performing Resource Reconfiguration In A Logically Partitioned Data Processing System”; and U.S. Pat. No. 4,843,541, entitled “Logical Resource Partitioning Of A Data Processing.”
Many computer systems, such as logically partitioned networked server computers, are designed and manufactured to operate on a nearly continuous basis. In the event of a system failure, the system is typically required to reboot and resume operation as fast as possible in an effort to minimize the amount of (unproductive) down time. Therefore, such systems are commonly designed on computing platforms (i.e., the underlying hardware and/or software for the systems) with self-diagnosis capabilities, such as capturing error data (e.g., hardware and software states) in the event of a system failure (this process is commonly referred to as a “system dump”). The error data may be used for self-diagnosis, to determine a cause of the failure at the platform level, or may be analyzed at a later time by support staff (e.g., a network engineer), for example, in the event the self-diagnosing system is unable to determine a cause.
Accordingly, it is a general requirement that the error data be preserved at least long enough to be analyzed. Conventionally, error data has been preserved at the platform level through storage to non-volatile memory (i.e., memory maintained in the absence of power), such as a load source disk, so that the error data is maintained even if power is lost. However, some newer computing platforms utilize a simplified “diskless” platform model that owns only processing resources. The diskless platform relies on an operating system for access to non-volatile storage, thus eliminating the need to maintain disk drivers at the platform level.
Because the diskless computing platform does not have direct access to non-volatile storage, the conventional approach for preserving error data at the platform level is unavailable. Accordingly, a new approach for preserving error data is needed for use in a diskless computing platform.