1. Technical Field
The present invention related generally to an improved data processing system and in particular to a method and apparatus for processing data. Still more particularly, the present invention provides a method, apparatus, and computer instructions for customizable surveillance of network interfaces.
2. Description of Related Art
A logical partitioned (LPAR) functionality within a data processing system (platform) allows multiple copies of a single operating system (OS) or multiple heterogeneous operating systems to be simultaneously run on a single data processing system platform. A partition, within which an operating system image runs, is assigned resources of a platform in which some resources are not overlapping while other resources may be shared. In particular, global resources, such as power supplies, fans, and system backplanes are shared across all of the partitions, while local resources such as I/O adapters and devices are not shared between partitions. These platform allocable resources include one or more architecturally distinct processors with their interrupt management area, regions of system memory, and input/output (I/O) adapter bus slots. The partition's resources are represented by the platform's firmware to the OS image.
Each distinct OS or image of an OS running within the platform is protected from each other such that certain errors on one logical partition cannot affect the correct operation of any of the other partitions. This is provided by allocating a disjoint set of platform resources to be directly managed by each OS image and by providing mechanisms for ensuring that the various images cannot control any resources that have not been allocated to it. Furthermore, software errors in the control of an operating system's allocated resources are prevented from affecting the resources of any other image. Thus, each image of the OS (or each different OS) directly controls a distinct set of allocable resources within the platform in which some of these resources are shared and others are unshared.
With respect to hardware resources in a LPAR system, these resources are disjointly shared among various partitions, themselves disjoint, each one seeming to be a stand-alone computer. These resources may include, for example, input/output (I/O) adapters, processors, and hard disk drives. Each partition within the LPAR system may be booted and shutdown over and over without having to power-cycle the whole system.
With respect to reporting of errors that occur in logical partitioned data processing systems or even in non-partitioned data processing systems, recoverable errors are reported through an “in-band” reporting system. The error reports are sent to another data processing system, such as a hardware management console through a communications link, also referred to as a “connection”. The reporting of these errors allows for service calls to be made for the data processing system reporting the error if needed. These connections are typically made over a network, such as a local area network, a wide area network, an intranet, or even the Internet. Since the recoverable errors are reported through a network interface, knowing about failures in the error reporting path is extremely important. Presently available monitoring systems may report outages in a LAN before the LAN becomes operationally stable in addition to reporting glitches in the LAN. As a result, undesirable false reporting may occur. The false report may cause a customer to turn off the monitoring system and be exposed to a real outage in the reporting path going undetected.
Therefore, it would be advantageous to have an improved method, apparatus, and computer instructions for monitoring for outages in error reporting paths.