1. Technical Field of the Invention
The present invention relates generally to the field of computer systems. More particularly, the present invention is drawn to bus protocol monitoring systems and methods which detect a violation relating to one or more of bus interface control signals.
2. Description of Related Art
Networks serve the purpose of connecting many different personal computers (PC's), workstations, or terminals to each other, and to one or more host computers, printers, file servers etc., so that expensive computing assets, programs, files and other data may be shared among many users.
In a network utilizing a client/server architecture, the client (personal computer or workstation) is the requesting a machine and the server is the supplying machine, both of which may preferably be connected via the network, such as a local area network (LAN), wide area network (WAN) or metropolitan area network (MAN). This is in contrast to early network systems that utilized a mainframe with dedicated terminals.
In a client/server network, the client typically contains a user interface and may perform some or all of the application processing and, as mentioned above, can include a personal computer or workstation. The server in a client/server network can be a high-speed microcomputer or minicomputer, and in the case of a high-end server, can include multiple processors and mass data storage devices such as multiple CD-ROM drives and multiple hard drives, preferably with Redundant Array of Inexpensive Disk (RAID) protection. An exemplary server such as a database server maintains the databases and processes requests from the client to extract data from or update the database. An application server provides additional business processing for the clients. The network operating system (NOS) together with the database management system (DBMS) and transaction monitor (TP monitor) are responsible for the integrity and security of the server.
Client/server networks are widely used throughout many different industries and business organizations, especially where mission-critical applications requiring high performance are routinely launched. The mass storage and multi-processing capabilities provided by current client/server network systems (for example, the high-end servers) that run such applications permit a wide range of essential services and functions to be provided through their use.
As can be appreciated, many businesses are highly dependent upon the availability of their client/server network systems to permit essential network services and functions to be carried out. As client/server network systems become increasingly essential to the everyday operations of such businesses, additional steps need to been taken in the design and construction of the server in the client/server network system to ensure its continuous availability to the clients. That is to say, in the design and construction of a server, steps need to be taken to ensure that the server can be operated with little or no downtime.
It can be appreciated by those skilled in the art that high availability, reliability and serviceability are valuable design aspects in ensuring that a server is a "zero downtime" system that will operate with little or no downtime. The modularity of components within a server has been recognized as an important design consideration in ensuring that the downtime of a server will be minimized. Modules can be removed and examined for operability or other purposes much easier than permanently mounted fixtures within a server chassis. When various components of a server can be provided in a modular form, they can also be readily replaced to maintain the operational status of the server with minimal downtime.
Removable modular components may include disc drives and power supplies. As described above, the removability of modular components allows for better overall serviceability of the computer system which is a distinct advantage. For example, a defective power supply in the server generally requires prompt replacement in order to limit downtime. Modular components and connectors facilitate prompt replacement and are thus popular in many computer designs.
Originally, a rule of practice in the maintenance of modular components or printed circuit boards of a server was that of turning the power to the server off before any modular components or printed circuit boards were removed from or added to the chassis or support frame of the server. Recent innovations have centered around a highly desirable design goal of "hot-pluggability" which addresses the benefits derived from inserting and removing modular components and printed cards from the chassis of the server when the server is electrically connected and operational. It can be readily appreciated that modularization and hot-pluggability can have a significant bearing on the high availability aspect of a high-end server.
Hot-pluggable components may include storage or disc drives, drive cages, fans, power supplies, system I/O boards, control boards, processor boards, and other sub-assemblies. The ability to remove these constituent components without having to power down the server allows for better overall serviceability of the system, which is a distinct advantage to both the user and the maintenance technician.
Component redundancy has also been recognized as an important design consideration in ensuring that a server will operate with little or no downtime. Essentially, component redundancy is typically provided in a system to better ensure that at least one of the redundant components is operable, thereby minimizing the system downtime. With component redundancy, at least two components are provided that can perform the same function, such that if one of the components becomes faulty for some reason, the operation fails over to the redundant component. When at least one of the redundant components is operable, continued operation of the computer system is possible even if others of the redundant components fail. To further enhance reliability and serviceability, redundant components have been made hot-pluggable.
Dynamic reconfiguration of a server system can also be accomplished by providing upgradable modular components therein. As can be readily appreciated, this objective can be accomplished by the addition or substitution of components having different circuits, preferably updated or upgraded, disposed therewithin. When components are redundant and hot pluggable, reconfiguration of the server is often possible without taking the server offline.
Another important design aspect with respect to providing redundant and hot pluggable components in a server system is to ensure and maintain a safe working environment while the server is operating and being repaired or upgraded. Accordingly, when the system components are swapped or upgraded, the exposure of hot connectors and contacts must be kept to a minimum. It can be appreciated by those skilled in the art that further developments in this area would significantly enhance the reliability and serviceability aspects of a high-end server system.
To further enhance the serviceability of server systems, additional innovations may be required in the design and construction of diagnostic sub-systems thereof. In existing client/server network systems it is often difficult to obtain in a timely manner important diagnostic data and information corresponding to a component failure in order to facilitate the quick serviceability of the server. Therefore, it can be appreciated that the more information that can be readily provided to locate a defective component or problem with the server, the better is the optimization of the amount of time the server is up and running.
It should be readily understood that the aspects of high availability, reliability and serviceability of computer systems are, at least in part, inter-related to the performance of such systems. For example, a poorly performing system is less likely to be highly available or reliable because such poor performance may typically result in persistent malfunctioning. As is known in the art, a significant parameter of system performance is the health of a conductive pathway, i.e., a bus provided in a system.
As is well-known in the art, computer system buses, having a plurality of conductive transmission lines, provide the means for interconnecting a plurality of electronic devices such that the devices may communicate with one another. These buses carry information including address information, control information, and data, in a logical manner as dictated by the design thereof. This logical manner is commonly referred to as the bus protocol. It is convenient to visualize the bus protocol as a combination of a "data protocol" portion and a "control protocol" portion. The data protocol portion relates to the rules concerning the actual data transfer itself and the signal conditions necessary therefor. The control protocol portion, on the other hand, may be visualized as the suite of interface control signals responsible for the operation of the bus itself.
It is known that data transfer on computer buses may sometimes be afflicted with errors. Accordingly, many high-performance buses, for example, the Peripheral Component Interconnect (PCI) bus, typically include in their bus protocol a set of signals for reporting any data transfer errors. While such features are useful in detecting and isolating data transfer errors (for example, data/address parity errors and the like), current technologies do not address or consider various anomalous conditions that might occur from time to time in the interface control signals themselves.
It may be appreciated that because the interface control signals (i.e., the control protocol portions) are ultimately responsible for the trouble-free operation of a bus, any violations associated therewith may cripple a bus system, thereby adversely impacting a computer system in which it is disposed. For example, it is known that some of the control protocol violations may lead to various bus lock-up or hang conditions that may result in significant system downtime.
Although various systems for monitoring data protocol errors have been known for some time, there are at present no known solutions that address control protocol violations. Accordingly, there has arisen a significant need for systems and methods for monitoring bus protocol violations.