1. Field of the Invention
The disclosed embodiments relate generally to providing increased data integrity in computer systems and, more particularly, to using a system management processor to overcome a computer system failure because of corrupted programming.
2. Background of the Related Art
This section is intended to introduce the reader to various aspects of art which may be related to various aspects of the present invention which are described and/or claimed below. This discussion is believed to be helpful in providing the reader with background information to facilitate a better understanding of the various aspects of the present invention. Accordingly, it should be understood that these statements are to be read in this light, and not as admissions of prior art.
Since the introduction of the first personal computer (“PC”) over 20 years ago, technological advances to make PCs more useful have continued at an amazing rate. Microprocessors that control PCs have become faster and faster, with operational speeds eclipsing the gigahertz (one billion operations per second) and continuing well beyond.
Productivity has also increased tremendously because of the explosion in development of software applications. In the early days of the PC, people who could write their own programs were practically the only ones who could make productive use of their computers. Today, there are thousands and thousands of software applications ranging from games to word processors and from voice recognition to web browsers.
a. The Evolution of Networked Computing and System Management Tools
In addition to improvements in PC hardware and software generally, the technology for making computers more useful by allowing users to connect PCs together and share resources between them has also seen rapid growth in recent years. This technology is generally referred to as “networking.” In a networked computing environment, PCs belonging to many users are connected together so that they may communicate with each other. In this way, users can share access to each other's files and other resources, such as printers. Networked computing also allows users to share internet connections, resulting in significant cost savings. Networked computing has revolutionized the way in which business is conducted across the world.
Not surprisingly, the evolution of networked computing has presented technologists with some challenging obstacles along the way. One obstacle is connecting computers that use different operating systems (“OSes”) and making them communicate efficiently with each other. Each different OS (or even variations of the same OS from the same company) has its own idiosyncrasies of operation and configuration. The interconnection of computers running different OSes presents significant ongoing issues that make day-to-day management of a computer network challenging.
Another significant challenge presented by the evolution of computer networking is the sheer scope of modern computer networks. At one end of the spectrum, a small business or home network may include a few client computers connected to a common server, which may provide a shared printer and/or a shared internet connection. On the other end of the spectrum, a global company's network environment may require interconnection of hundreds or even thousands of computers across large buildings, a campus environment, or even between groups of computers in different cities and countries. Such a configuration would typically include a large number of servers, each connected to numerous client computers.
Further, the arrangements of servers and clients in a larger network environment could be connected in any of a large number of topologies that may include local area networks (“LANs”), wide area networks (“WANs”) and municipal area networks (“MANs”). In these larger networks, a problem with any one server computer (for example, a failed hard drive, corrupted system software, failed network interface card or OS lock-up to name just a few) has the potential to interrupt the work of a large number of workers who depend on network resources to get their jobs done efficiently. Needless to say, companies devote a lot of time and effort to keeping their networks operating trouble-free to maximize productivity.
An important aspect of efficiently managing a large computer network is to maximize the amount of analysis and repair that can be performed remotely (for example, from a centralized administration site). Tools that facilitate remotely analyzing and servicing server problems help to control network management costs by reducing the number of network management personnel required to maintain a network in good working order. System management also makes network management more efficient by reducing the delay and expense of analyzing and repairing network problems. Using remote management tools, a member of the network management team may identify problems and, in some cases, solve those problems without the delay and expense that accompanies an on-site service call to a distant location.
In one system management strategy, a system management processor, which is completely separate from the system microprocessor(s), operates independently to provide system management functionality and remote communication capability. These system management processors have the capability of monitoring and controlling a wide range of system information. Some system management processors may be powered up even when the main computer system that they support is not powered up.
b. The Need for Robustness when System Data Becomes Corrupted
Modern users of computer systems typically expect very high levels of availability from their systems. To satisfy this desire, manufacturers of computer systems strive to make systems as robust as possible. One source of potential system problems is corruption of data stored in non-volatile random access memory (NVRAM) or electrically erasable and programmable read only memory (EEPROM). The corruption of data can occur for any number of reasons, such as system power failure during operation of the computer system, errors while flashing the EEPROM of the computer system to upgrade the BIOS or other firmware or the like.
The system BIOS or other system firmware of most computer systems is typically stored in EEPROM memory. One example of system firmware that may be stored in EEPROM memory is Processor Abstraction Layer (PAL) Code that is typically used in computer systems that implement the IA64 architecture promulgated by Intel Corporation of Santa Clara, Calif. In many systems, an additional back-up copy of the BIOS or system firmware is also stored in EEPROM memory. If the system BIOS becomes corrupt, the computer system will probably not be able to boot or otherwise operate correctly. When a system is down because of BIOS corruption, users are not able to access system resources, making it more difficult for them to perform their assigned tasks. Repair of systems with corrupted BIOS or other firmware can be costly and time consuming. Even in cases where the system BIOS or other firmware is backed up, an on-site service call may be required to cause the failed machine to load the backed up copy of the BIOS or other firmware. A method and apparatus that reduces undesirable system downtime because of the corruption of the system BIOS or other firmware is desirable.