1. Technical Field
The present invention relates generally to an improved data processing system, and in particular, to a method and apparatus for error analysis. Still more particularly, the present invention provides a method and apparatus for retrieving logs for a partition in a logical partitioned data processing system.
2. Description of Related Art
A logical partitioned (LPAR) functionality within a data processing system (platform) allows multiple copies of a single operating system (OS) or multiple heterogeneous operating systems to be simultaneously run on a single data processing system platform. A partition, within which an operating system image runs, is assigned a non-overlapping subset of the platform's resources. These platform allocable resources include one or more architecturally distinct processors with their interrupt management area, regions of system memory, and input/output (I/O) adapter bus slots. The partition's resources are represented by the platform's firmware to the OS image.
Each distinct OS or image of an OS running within the platform is protected from each other such that software errors on one logical partition cannot affect the correct operation of any of the other partitions. This is provided by allocating a disjoint set of platform resources to be directly managed by each OS image and by providing mechanisms for ensuring that the various images cannot control any resources that have not been allocated to it. Furthermore, software errors in the control of an operating system's allocated resources are prevented from affecting the resources of any other image. Thus, each image of the OS (or each different OS) directly controls a distinct set of allocable resources within the platform.
An operating system within a LPAR data processing system may periodically call a routine to check states and report errors that are found. This routine is part of a run-time abstraction services (RTAS) component and is called an event scan. RTAS is designed to insulate an operating system from having to know about and manipulate platform functions that would require platform specific code. The RTAS is called as an interface to hardware, such as hardware registers. Each partition has a copy of RTAS in memory. RTAS is found in IBM eServer pSeries products, which are available from International Business Machines Corporation. The event scan function checks for error logs that may have been reported by various subsystems of the data processing system. These subsystems include, for example, the service processor, open firmware, and non-maskable machine interrupt code. Each of these subsystems places reported error logs for an operating system in a specific location. One location used by service processors to place reportable logs for a partition is a non-volatile random access memory (NVRAM). The event scan function searches the various locations used by these components to find for new non-reported error logs. When a new non-reported log is identified, this function reports the log to the operating system that the log is to be reported and marks the log so that it is no longer considered new and unreported. By marking the log in this manner, the event scan function will not report the log again at a later time. Additionally, this allows the space occupied by the log to be overlaid with a new non-reported log.
In a symmetric multiprocessor mode configuration, the entire data processing system is owned by one operating system. As a result, only one instance of the event scan function is called. With only one instance of the event scan function, any error log reported to the operating system can be marked as old and reported. In a LPAR environment, a few problems become apparent. For example, with each LPAR partition, an instance of the event scan function may be called. Each event scan function is required to report the same error logs to their respective operating systems. It is important that the NVRAM locations in which the subsystems place new error logs do not become clogged. Otherwise, errors may be missed. A partition within a LPAR system booted days or months after the data processing system has been started has no reason to receive outdated error logs even if the error would be considered new to the partition. In an LPAR system, the event scan function, called by one partition, is unable to mark an error log as old and reported because the error log may not be old or reported to another partition. Without this ability to mark error logs, logs cannot be removed, preventing the addition of new error logs when the memory space is used up.
Therefore, it would be advantageous to have an improved method, apparatus, and computer instructions for reporting error logs in a LPAR data processing system.