This invention relates generally to computer systems, and more particularly to identifying, capturing, isolating and diagnosing errors in computer system operation.
As is known in the art, bus adapters and other devices are connected to a bus through bus interfaces and occupy physical bus locations for device interconnection called xe2x80x9cbus slotsxe2x80x9d. A bus interface is typically designed for a particular type of bus, and is responsible for complying with the signaling requirements of the bus, sometimes called its xe2x80x9cbus protocolxe2x80x9d. The bus protocol includes the bus"" electrical, physical and logical characteristics for reliable bus transfers. The bus interface generally includes bus drivers and bus receivers to send and receive, respectively, signals over the bus in accordance with the bus protocol. Essentially, each device connected to the bus has a separate instance of a bus interface for each line of the bus, each including a driver for driving that line and a receiver for sensing voltages on that line and resolving them into logic states. Bus protocols are typically specified by manufacturers and often by standards-making organizations. Bus adapters include bus interfaces for each of the buses to which they are connected.
For operation of the bus, certain of the devices can initiate requests to gain control of the bus to perform, for example, a memory access such as a xe2x80x9creadxe2x80x9d or xe2x80x9cwritexe2x80x9d operation. Such operations require the requesting device (e.g., a central processing unit (CPU) and the responding device (e.g., a memory) to exchange a number of bus signals. Initially, the requesting device needs to acquire control of the bus. This can be effected, e.g., through arbitration, which generally requires the exchange of arbitration and other handshaking signals over the bus with other bus devices such as a bus arbiter and/or other potential requesting devices. Then, after eventually gaining control, the requesting device needs to assert the appropriate command line, e.g., the read line or the write line, to designate the type of operation. Additionally, for memory operations, the requesting device needs to place address information on address lines of the bus to identify the memory addresses to be accessed. Finally, the responding device needs to respond to the command, e.g., the memory needs to place data onto the bus from the addressed locations, or receive data from the bus and store them at the addressed locations.
For purposes hereof, a xe2x80x9cbus transactionxe2x80x9d can be defined as the set of all bus signals (e.g., handshaking, command, address and data) asserted after the requesting device has gained control of the bus, which are used to complete a logical task, such as a xe2x80x9creadxe2x80x9d or xe2x80x9cwritexe2x80x9d operation, performed over the bus. Devices connected to the bus transmit the signals of a bus transaction in synchrony with the bus"" clock. A xe2x80x9cbus cyclexe2x80x9d refers to the number of bus clock cycles required to perform a bus transaction. During a bus cycle, the requesting device asserts certain bus lines in accordance with the bus protocol, and the responding device scans certain bus lines to ascertain the information contained in handshaking, command, address, and data signals also in accordance with the bus protocol.
To assert a bus line, a bus device drives the bus line to a high voltage level or a low voltage level during each of one or more clock ticks during a bus cycle. The high voltage and low voltage levels correspond to digital LOGIC HIGH and LOGIC LOW states. To scan a bus line, a bus device typically detects the voltage on the line at a particular time, e.g., at a rising or falling edge of the bus clock, and determines whether the detected voltage is at a high or low level. The voltage level on certain lines determines, for example, whether the transaction is a read or a write, and, on other lines, whether the data includes a LOGIC HIGH or LOGIC LOW during the corresponding tick of the bus clock. Many bus lines are only driven for a portion (often only a small portion) of the bus cycle of a bus transaction.
Computer system architecture has advanced dramatically in performance and complexity. In terms of performance, computer systems can achieve higher clock speeds with increased bus widths and lower bus operating voltages. Increased bus clock speeds, measured usually in megaHertz (MHz) can allow data to be transferred faster over the computer system""s bus, thereby allowing computer applications to run faster. The size of a bus, known as its width, corresponds generally to the number of data lines in the bus and determines how much data can be transmitted in parallel at the same time; thus, wider buses typically transfer data faster. Lower bus operating voltages can advantageously also reduce power consumption, which is important, for example, in miniaturization of integrated circuits and in mobile computing for extending battery operating times. Unfortunately, lower operating voltages can make bus signals more susceptible to signaling errors due to lower signal-to-noise ratios and to signal distortion. Such noise and signal distortion can make it difficult for bus receivers to differentiate correctly, e.g., between data logic states, thus potentially yielding erroneous data.
Transient and other non-predictable errors in the received bus signals can also arise from other causes as well, and often have deleterious impacts on computer system performance. Such errors can arise, for example, from degradation over time of bus drivers and receivers in bus interfaces. Bus errors can also arise due to non-compatibility of add-on components such as adapter cards that are integrated into the computer system after installation at a customer site, and connected to one of the computer""s buses, e.g., through xe2x80x9cplug and playxe2x80x9d operation. Where such adapter cards malfunction or simply exhibit operating parameters unanticipated by the original computer manufacturer, transfer errors can arise on the bus to which they are connected. Such bus errors can result in lost or corrupted data or hanging of the bus protocol so as to prevent completion of a bus transaction. In extreme cases, bus errors can cause system crashes.
For diagnosing bus error conditions, it is often necessary to reproduce the errors. For example, when an error occurs during normal transfers over a system bus of a computer system, it may be necessary to drive the system bus with the same set of stimuli under the same conditions as when the error occurred in order to determine its causes. It may prove difficult to apply such stimuli and reproduce the error conditions under control of the computer""s operating system due to the complexities involved.
It would be desirable to provide a technique for testing bus operation to determine whether the bus is likely to perform adequately during actual operating conditions, and to assess the likelihood of bus transfer errors. Such testing should preferably lend itself for use in design verification and quality assurance prior to shipment from a system manufacturer, as well as in field servicing to assure bus operation has not degraded after installation at a customer""s facility. It would also be desirable to be able to run such testing in electronic devices using designed-in testing features rather external testing apparatus that may affect testing results and are cumbersome, time-consuming and costly to use.
The invention resides in a computer system or other electronic apparatus in which bus testing logic is built into some of the devices connected to the bus to enable these devices to perform diagnostic testing of the bus. Under control of the test logic, the devices drive the bus with output voltages corresponding to a predetermined test bit pattern. Preferably the test bit pattern is selected to cause the bus to reach a target bus utilization or saturation level. The test bit pattern can include a plurality of digital values corresponding to drive voltages for the bus for testing that target bus utilization level over a bus cycle of a bus transaction. The bus signals produced by the devices propagate along the bus and are received by other devices. The received bus signals are resolved into a received bit pattern. The received bit pattern can be compared with the test bit pattern used to generate the bus signals in order to detect discrepancies, or a first failure resulting from the test can be captured, as described in the above-referenced patent application entitled xe2x80x9cMethod and Apparatus for Extracting First Failure and Attendant Operating Information From Computer System Devicesxe2x80x9d.
In one embodiment, the devices can operate in a first mode by driving the bus while performing normal operating functions of the device or in a second mode while performing diagnostic testing on the bus by driving the bus in accordance with the test bit patterns. Test patterns can be interleaved with normal bus signals. Alternatively, the test logic in the devices can arbitrate with the normal circuitry to assume control of the bus for testing purposes. Preferably, the same bus drivers and receivers which are used for normal device operation are used for bus testing. Alternatively, dedicated bus drivers and receivers can be used.
In accordance with another embodiment, the received bit pattern is stored in the devices and JTAG technology is used to provide the test bit pattern to the devices and to scan out the received bit pattern. Internal diagnostic logic, or an external test console or service processor can then perform the analysis of the bit patterns.
In still another embodiment, the bus driver which generates the testing bus signals is located in a different device from the bus receiver which detects the testing bus signals. In yet another embodiment, a first device includes both a bus driver and a bus receiver and a second device includes logic for looping the bus signals back to the device that generated the signals. In this manner, a comparison of the bus signals can be performed in a single device.
The invention permits system stress testing without the need of instruction stream generated bus cycles. The testing can be performed, for example, for purposes of design verification, diagnostic testing after an error has been encountered, or on a regular basis, e.g., as part of power on self-test (POST) procedures. The invention permits deterministic saturations of the bus when and where desired, e.g., for inducing various forms of error conditions, such as system-level bottlenecks and latencies, in a reproducible manner. The invention can also be used to associate a xe2x80x9cvictimxe2x80x9d bit on a bus with its xe2x80x9caggressorxe2x80x9d bit, and thus trace causes of bus error conditions.