1. Technical Field
The present invention relates generally to an improved data processing system and in particular to a method and apparatus for processing data. Still more particularly, the present invention relates to a method, apparatus, and computer instructions for managing trace data in a logical partitioned data processing system.
2. Description of Related Art
Increasingly large symmetric multi-processor data processing systems, such as IBM eServer P690, available from International Business Machines Corporation, DHP9000 Superdome Enterprise Server, available from Hewlett-Packard Company, and the Sunfire 15K server, available from Sun Microsystems, Inc. are not being used as single large data processing systems. Instead, these types of data processing systems are being partitioned and used as smaller systems. These systems are also referred to as logical partitioned (LPAR) data processing systems. A logical partitioned functionality within a data processing system allows multiple copies of a single operating system or multiple heterogeneous operating systems to be simultaneously run on a single data processing system platform. A partition, within which an operating system image runs, is assigned a non-overlapping subset of the platforms resources. These platform allocatable resources include one or more architecturally distinct processors with their interrupt management area, regions of system memory, and input/output (I/O) adapter bus slots. The partition's resources are represented by the platform's firmware to the operating system image.
Each distinct operation system or image of an operating system running within a platform is protected from each other such that software errors on one logical partition cannot affect the correct operations of any of the other partitions. This protection is provided by allocating a disjointed set of platform resources to be directly managed by each operating system image and by providing mechanisms for insuring that the various images cannot control any resources that have not been allocated to that image. Furthermore, software errors in the control of an operating system's allocated resources are prevented from affecting the resources of any other image.
Thus, each image of the operating system or each different operating system directly controls a distinct set of allocatable resources within the platform. With respect to hardware resources in a logical partitioned data processing system, these resources are disjointly shared among various partitions. These resources may include, for example, input/output (I/O) adapters, memory DIMMs, non-volatile random access memory (NVRAM), and hard disk drives. Each partition within an LPAR data processing system may be booted and shut down over and over without having to power-cycle the entire data processing system.
When a logical partitioned data processing system experiences a failure, data relating to processes and system states are needed to help identify and analyze the failure. In current logical partitioned data processing systems, some of the data needed to diagnose a failure is not available because of the current design of the systems. For example, the platform firmware includes a trace facility to allow for tracing of code paths in the firmware. An example of platform firmware used in logical partitioned data processing systems is a hypervisor, which is available from International Business Machines Corporation.
With the currently used trace facilities, trace information showing the code path taken in the platform firmware and critical data values are written into a trace buffer as each partition makes platform firmware calls. This trace information is particularly critical when an error is encountered by a partition and the error path is traced along with critical data values.
Currently all logical partitioned mode data processing system platforms support a hypervisor trace facility used to write hypervisor code execution trace point data into a trace buffer located in hypervisor space during hypervisor execution. This hypervisor trace data is critical for effective failure analysis in the field in the event of system failures.
This situation creates a problem with large configurations where processors are dedicated to multiple partitions in which these partitions write to the same buffer. These buffers are typically organized in a circular fashion. Thus, if a partition crash occurs, the trace data may be quickly overwritten by other partitions in the logical partitioned data processing system. As a result, critical data, required to help in the diagnosis of the problem, may be lost.
One solution is to create a larger buffer. Further, as the number of partitions increase, the size of this trace buffer is required to grow to accommodate additional partitions. The buffer structure must be preallocated with the largest configuration in mind because each logical partitioned data processing system is configured individually and dynamic configuration is allowed. As a result, wasted memory space is present for smaller configurations. Further, in systems in which system memory is a premium price, the wasted space increases the cost of the logical partitioned data processing system.
Therefore, it would be advantageous to have an improved method, apparatus, and computer instructions for preserving trace data.