1. Field of the Invention
The present invention relates to computer systems using information buses to interface a central processor(s), random access memory and input-output peripherals together, and more particularly, in utilizing in a computer system a fault-tolerant interconnection system for a plurality of peripheral component interconnect (PCI) devices.
2. Description of the Related Technology
Use of computers, especially personal computers, in business and at home is becoming more and more pervasive because the computer has become an integral tool of most information workers who work in the fields of accounting, law, engineering, insurance, services, sales and the like. Rapid technological improvements in the field of computers have opened up many new applications heretofore unavailable or too expensive for the use of older technology mainframe computers. These personal computers may be used as stand-alone workstations (high end individual personal computers) or linked together in a network by a "network server" which is also a personal computer which may have a few additional features specific to its purpose in the network. The network server may be used to store massive amounts of data, and may facilitate interaction of the individual workstations connected to the network for electronic mail ("E-mail"), document databases, video teleconferencing, whiteboarding, integrated enterprise calendar, virtual engineering design and the like. Multiple network servers may also be interconnected by local area networks ("LAN") and wide area networks ("WAN").
A significant part of the ever increasing popularity of the personal computer, besides its low cost relative to just a few years ago, is its ability to run sophisticated programs and perform many useful and new tasks. Personal computers today may be easily upgraded with new peripheral devices for added flexibility and enhanced performance. A major advance in the performance of personal computers (both workstation and network servers) has been the implementation of sophisticated peripheral devices such as video graphics adapters, local area network interfaces, SCSI bus adapters, full motion video, redundant error checking and correcting disk arrays, and the like. These sophisticated peripheral devices are capable of data transfer rates approaching the native speed of the computer system microprocessor central processing unit ("CPU"). The peripheral devices' data transfer speeds are achieved by connecting the peripheral devices to the microprocessor(s) and associated system random access memory through high speed expansion local buses. Most notably, a high speed expansion local bus standard has emerged that is microprocessor independent and has been embraced by a significant number of peripheral hardware manufacturers and software programmers. This high speed expansion bus standard is called the "Peripheral Component Interconnect" or "PCI." A more complete definition of the PCI local bus may be found in the PCI Local Bus Specification, revision 2.1; PCI/PCI Bridge Specification, revision 1.0; PCI System Design Guide, revision 1.0; PCI BIOS Specification, revision 2.1, and Engineering Change Notice ("ECN") entitled "Addition of `New Capabilities` Structure," dated May 20, 1996, the disclosures of which are hereby incorporated by reference. These PCI specifications and ECN are available from the PCI Special Interest Group, P.O. Box 14070, Portland, Oreg. 97214.
A computer system uses a plurality of information (data and address) buses such as a host bus, a memory bus, at least one high speed expansion local bus such as the PCI bus, and other peripheral buses such as the Small Computer System Interface (SCSI), Extension to Industry Standard Architecture (EISA), and Industry Standard Architecture (ISA). The microprocessor(s) (CPU) of the computer system communicates with main memory and with the peripherals that make up the computer system over these various buses. The microprocessor(s) communicate(s) to the main memory over a host bus to memory bus bridge. The main memory generally communicates over a memory bus through a cache memory bridge to the CPU host bus. The peripherals, depending on their data transfer speed requirements, are connected to the various buses which are connected to the microprocessor host bus through bus bridges that detect required actions, arbitrate, and translate both data and addresses between the various buses.
The choices available for the various computer system bus structures and devices residing on these buses are relatively flexible and may be organized in a number of different ways. One of the more desirable features of present day personal computer systems is their flexibility and ease in implementing custom solutions for users having widely different requirements. Slower peripheral devices may be connected to the ISA or EISA bus(es), other peripheral devices, such as disk and tape drives may be connected to a SCSI bus, and the fastest peripheral devices such as network interface cards (NICs) and video graphics controllers may require connection to the PCI bus. Information transactions on the PCI bus may operate at 33 MHz or 66 MHz clock rates and may be either 32 or 64 bit transactions.
The PCI 2.1 Specification supports a high 32 bit bus, referred to as the 64 bit extension to the standard low 32 bit bus. The 64 bit bus provides additional data bandwidth for PCI devices that require it. The high 32 bit extension for 64 bit devices requires an additional 39 signal pins: REQ64#, ACK64#, AD[63:32], C/BE[7:4]#, and PAR64. These signals are defined more fully in the PCI 2.1 Specification incorporated by reference hereinabove. 32 bit PCI devices work unmodified with 64 bit PCI devices. A 64 bit PCI device must default to 32 bit operation unless a 64 bit transaction is negotiated. 64 bit transactions on the PCI bus are dynamically negotiated (once per transaction) between the master and target PCI devices. This is accomplished by the master asserting REQ64# and the target responding to the asserted REQ64# by asserting ACK64#. Once a 64 bit transaction is negotiated, it holds until the end of the transaction. Signals REQ64# and ACK64# are externally pulled up by pull up resistors to ensure proper behavior when mixing 32 bit and 64 bit PCI devices on the PCI bus. A central resource controls the state of REQ64# to inform the 64 bit PCI device that it is connected to a 64 bit bus. If REQ64# is deasserted when RST# is deasserted, the PCI device is not connected to a 64 bit bus. If REQ64# is asserted when RST# is deasserted, the PCI device is connected to a 64 bit bus.
Many components and connections are required for operation of the features inherent in today's computer systems. Miniaturization and automated assembly have decreased the cost of computers, but sometimes create latent malfunctions later during operation of the computer system. Typically, printed circuit boards having conductive patterns are used to interconnect integrated circuit packages such as a ball grid array (BGA) using surface mount techniques. There may be hundreds of contacts (tiny solder balls) on a BGA package and each must be properly connected to respective connections of the conductive patterns on the printed circuit boards of the computer system. Some problems that may not be found during manufacture, or may develop later during operation of the computer system are shorted or open connections between the contacts of the BGA package. Unless pattern sensitive tests are run, an open connection may appear as the correct logic level, and shorted connections may not be noticed if the same logic level is on the shorted connections. Devices in the integrated circuit packages of the computer system also may either short or open, giving an erroneous signal. Generation and checking of parity is a way of detecting data transmission malfunctions in the computer system.
The PCI Specification requires generation of parity information for all PCI devices that drive address and/or data information onto the address/data (AD[31:0]) bus. The PCI AD[31:0] bus is a time-multiplexed address/data bus. During the address phase of a PCI transaction, the AD[31:0] bus carries the start address of the PCI transaction. The Command or Byte Enable bus (C/BE#[3:0]), defines the type of transaction to be performed. A Parity signal (PAR) is driven by the initiator one clock after completion of the address phase either high or low to ensure even parity with the AD[31:0] bus and the C/BE#[3:0] bus for a total of 37 bits having an even parity, i.e., the number of logic "1s" on the combined 37 bit bus is an even number.
During the data phase(s) of the PCI transaction, the AD[31:0] bus is driven by the initiator (during a write transaction) or the currently-addressed target (during a read transaction). The C/BE#[3:0] bus is driven by the initiator to indicate the bytes to be transferred within the currently-addressed doubleword ([31:0]) and the data paths to be used to transfer the data. PAR is driven by either the initiator (during a write transaction) or the currently-addressed target (during a read transaction) so that the combination of AD[31:0], C/BE#[3:0] and PAR (37 bits total) has an even parity (an even number of logic "1s"). For 64 bit data transfers the upper address/data bus AD[61:32]) and C/BE#[7:4], in conjunction with the lower address/data bus AD[31:0] and C/BE#[3:0], are utilized to transfer a quadword (64 bits) of data. An upper Parity signal (PAR64) is used in combination with the AD[61:32] and C/BE#[7:4] buses to represent an even parity across these upper 37 bits.
The bus agent receiving the data (target on a write transaction or initiator on a read transaction) must calculate whether a parity bit should be a logic "1" or a logic "0" to produce an even parity based upon the number of logic "1s" received on the combination of the AD and C/BE buses. If the calculated parity bit does not match the logic value of the asserted PAR or PAR64 for the upper bus, then a parity error has occurred. The PCI Specification defines a parity error signal (PERR#) to indicate that a parity error has occurred on either or both of the upper AD[63:32] and CB/E#[7:4] buses, or lower AD[31:0] and CB/E#[3:0] buses during a data phase. PERR# does not indicate which of these buses has the data parity error.
The PCI Specification permits recovery from data phase parity errors. Recovery from a data phase parity error may be attempted by the PCI master, the device driver or by the operating system. The PCI Specification recommends that the recovery be attempted at the lowest possible level (i.e., by the bus master). If the data phase parity error cannot be recovered from, the parity error must be reported to the operating system.
Operation of the 64 bit PCI bus and PCI devices are more fully described in commonly owned U.S. patent application Ser. No. 08/723,767, filed Sep. 30, 1996, entitled "A Fault-Tolerant Bus System" by Sompong P. Olarig, and is hereby incorporated by reference above. The invention disclosed in this patent application recovers from an operating fault on the upper 32-bit portion of a 64-bit PCI bus by transferring data and control information only on the operating lower 32-bit portion of the 64-bit PCI bus, however, no recovery is possible when the fault is on the lower 32-bit portion thereof.
What is needed is an apparatus, method, and system for improving fault tolerance on a 64-bit data-width PCI bus when either the upper or lower 32-bit data-width portions of the 64-bit data-width PCI bus may have an operating fault.