1. Field of the Invention
The invention relates generally to information processing systems, such as system servers and personal computers (PCs). More particularly, this invention relates to the management and maintenance of information system failures.
2. Description of the Related Art
Information processing systems, such as computer system servers, have virtually become an inseparable part of information processing networks. These systems communicate and process an enormous amount of information in a relatively short time. To perform these sophisticated tasks, a computer system server typically includes various subsystems and components such as a plurality of microprocessors, memory modules, various system and bus control units, and a wide variety of data input/output (I/O) devices. These computer components communicate information using various data rates and protocols over multiple system buses. The demand for faster processing speeds, and the revolutionary fast-track development of computer systems, have necessitated the use of interconnecting devices. The wide variety of these devices, coupled with various data transfer protocols, have added special complexity to the management and maintenance of faults occurring in such information systems.
To facilitate the understanding of the invention, a brief description of the I.sup.2 C bus protocol is first provided. FIG. 1 is a functional block diagram of an exemplary I.sup.2 C bus application. As shown in FIG. 1, an I.sup.2 C Bus 100 is provided to support data transfer among a variety of I.sup.2 C devices. The I.sup.2 C Bus 100 is a serial interface bus that allows multiple I.sup.2 C devices to communicate via a bidirectional, two-wire serial bus. The I.sup.2 C Bus 100 comprises two wires: a serial data line (SDA) 102 and a serial clock line (SCL) 104. The SDA 102 carries data transmissions among I.sup.2 C devices, and the SCL 104 carries the clock timing information that synchronizes the data transmission. A complete system usually consists of at least one microcontroller and other peripheral devices such as memory units and input/output (I/O) expanders for transferring data on the I.sup.2 C Bus 100. These peripheral devices may include liquid crystal display (LCD) and light emitting diode (LED) drivers, random access memory (RAM) and read only memory (ROM) devices, clock/calendars, I/O expanders, analog-to-digital (A/D) and digital-to-analog (D/A) converters.
As shown in FIG. 1, a micro-controller A 106 and a micro-controller B 108 are coupled to the I.sup.2 C Bus 100 for exchanging information on the I.sup.2 C Bus 100. Additionally, an I.sup.2 C-ISA Interface 110 is connected to the I.sup.2 C Bus 100 to provide access interface between industry standard architecture (ISA) devices and I.sup.2 C devices. A LCD driver 112 is coupled to the I.sup.2 C Bus 100 for displaying information accessed from other I.sup.2 C devices located on the I.sup.2 C Bus 100. An I/O Expander 114 is also coupled to the I.sup.2 C Bus 100 to enable I/O devices (not shown in this figure) to obtain direct access to the I.sup.2 C Bus 100. Moreover, a memory device 116 such as a RAM or an electrically erasable programmable read only memory (EEPROM) is also coupled to the I.sup.2 C Bus 100 to provide storage of data transmitted by other I.sup.2 C devices.
Each device connected to the I.sup.2 C bus is software addressable by a unique address and simple master/slave relationships exist at all times. The term "master" refers to an I.sup.2 C device which initiates a transfer command to another I.sup.2 C device, generates clock signals, and terminates the transfer on the I.sup.2 C bus. The term "slave" refers to the I.sup.2 C device which receives the transfer command from the master device on the I.sup.2 C bus. The I.sup.2 C bus is a true multi-master bus which includes collision detection and arbitration to prevent data corruption if two or more masters simultaneously initiate data transfer. Moreover, I.sup.2 C devices act as transmitters and receivers. A "transmitter" is the I.sup.2 C device which sends the data to the I.sup.2 C Bus 100. A "receiver" is the I.sup.2 C device which receives the data from the I.sup.2 C Bus 100. Arbitration refers to a procedure whereby, if more than one master simultaneously attempts to control the I.sup.2 C Bus 100, only one is allowed to do so and the transmitted message is not corrupted.
The I.sup.2 C Bus 100 supports up to 40 I.sup.2 C devices and may have a maximum length of 25 feet. The I.sup.2 C Bus 100 supports a transfer data rate of up to 100 kilobits/second (kbps) in "standard mode," or up to 400 kbps in "fast mode." Data transfers over the I.sup.2 C Bus 100 follow a well-defined protocol. A transfer always takes place between a master and a slave. All bus transfers are bounded by a "Start" and a "Stop" condition. In the standard mode, the first byte after the Start condition usually determines which slave will be selected by the master. In the fast mode, the first two bytes after the Start condition usually determine which slave will be selected by the maser. Each peripheral device on the I.sup.2 C Bus 100 has a unique 8-bit address in the standard mode, or a 10-bit address in the fast mode. The address is hard-coded for each type of I.sup.2 C device, but some devices provide an input pin that allows a designer to specify one bit of the device's I.sup.2 C address. This allows two identical I.sup.2 C devices used on the same bus to be addressed individually.
With the increased complexity of information processing systems, the frequency of system failures due to system- and component-level errors has increased. Some of the problems are found in the industry standard architecture (ISA) bus used in IBM PC-compatible computers. The enhanced ISA (EISA) provided some improvement over the ISA architecture of the IBM PC/AT, but more resistance to failure and higher performance are still required. Other problems may exist in interface devices, such as bus-to-bus bridges. Additionally, problems may exist in bus peripheral devices such as microcontrollers, central processors, power supplies, cooling fans, and other similar components.
With these added components and subsystems, occasional system failures have become inevitable. Existing information systems do not currently provide a tool for managing these failures. More importantly, present systems do not possess the means to more efficiently diagnose and restore the system from the occurrence of such failures. Therefore, when failures occur, there is a need to identify the events leading up to these failures. The ability to identify the events leading up to system failures minimizes downtime and ensures more efficient system maintenance and repair in the future.