1. Technical Field
The present invention relates generally to an improved data processing system and in particular to a method and apparatus for managing data in a data processing system. Still more particularly, the present invention relates to a method and apparatus for managing error logs in a logical partitioned data processing system.
2. Description of Related Art
Increasingly large symmetric multi-processor data processing systems, such as IBM eServer P690, available from International Business Machines Corporation, DHP9000 Superdome Enterprise Server, available from Hewlett-Packard Company, and the Sunfire 15K server, available from Sun Microsystems, Inc. are not being used as single large data processing systems. Instead, these types of data processing systems are being partitioned and used as smaller systems. These systems are also referred to as logical partitioned (LPAR) data processing systems. A logical partitioned functionality within a data processing system allows multiple copies of a single operating system or multiple heterogeneous operating systems to be simultaneously run on a single data processing system platform. A partition, within which an operating system image runs, is assigned a non-overlapping subset of the platforms resources. These platform allocatable resources include one or more architecturally distinct processors with their interrupt management area, regions of system memory, and input/output (I/O) adapter bus slots. The partition's resources are represented by the platform's firmware to the operating system image.
Each distinct operation system or image of an operating system running within a platform is protected from each other such that software errors on one logical partition cannot affect the correct operations of any of the other partitions. This protection is provided by allocating a disjointed set of platform resources to be directly managed by each operating system image and by providing mechanisms for insuring that the various images cannot control any resources that have not been allocated to that image. Furthermore, software errors in the control of an operating system's allocated resources are prevented from affecting the resources of any other image. Thus, each image of the operating system or each different operating system directly controls a distinct set of allocatable resources within the platform.
With respect to hardware resources in a logical partitioned data processing system, these resources are disjointly shared among various partitions. These resources may include, for example, input/output (I/O) adapters, memory DIMMs, non-volatile random access memory (NVRAM), and hard disk drives. Each partition within an LPAR data processing system may be booted and shut down over and over without having to power-cycle the entire data processing system.
Each of these partitions contains an error log. An error log provides information on the status and behavior of devices over time. In a logically partitioned system, devices sometimes may be dynamically moved between partitions. Partitions normally do not share resources and are completely autonomous. When a resource is moved between partitions, any error log entries pertaining to the resource that exists in the former partition remain in that partition. New entries are made for the resource in the new partition.
Many diagnostics processes perform error log analysis based on thresholding of certain types of errors over a given period of time. The old entries for the resource in the prior partition are not accessible from the current partition by the diagnostics process. As a result, it is possible that a diagnostics process will not fail a resource, warn of an imminent failure of a resource, or provide informational messages relative to the resource in the new partition based on the entries found in the new partition.
The present invention recognizes that this behavior is due to the fact that the device has not reached the failure threshold in the new partition.
Therefore, it would be advantageous to have an improved method, apparatus, and computer instructions for managing error log entries in a logical partitioned data processing system.