This invention relates generally to computer systems, and more particularly to identifying, capturing, isolating and diagnosing errors in computer system operation.
As is known in the art, bus adapters and other devices are connected to a bus through bus interfaces and occupy physical bus locations for device interconnection called xe2x80x9cbus slotsxe2x80x9d. A bus interface is typically designed for a particular type of bus, and is responsible for complying with the signaling requirements of the bus, sometimes called its xe2x80x9cbus protocolxe2x80x9d. The bus protocol includes the bus"" electrical, physical and logical characteristics for reliable bus transfers. The bus interface generally includes bus drivers and bus receivers to send and receive, respectively, signals over the bus in accordance with the bus protocol. Essentially, each device connected to the bus has a separate instance of a bus interface for each line of the bus, each including a driver for driving that line and a receiver for sensing voltages on that line and resolving them into logic states. Bus protocols are typically specified by manufacturers and often by standards-making organizations. Bus adapters include bus interfaces for each of the busses to which they are connected.
For operation of the bus, certain of the devices can initiate requests to gain control of the bus to perform, for example, a memory access such as a xe2x80x9creadxe2x80x9d or xe2x80x9cwritexe2x80x9d operation. Such operations require the requesting device (e.g., a central processing unit (CPU) and the responding device (e.g., a memory) to exchange a number of bus signals. Initially, the requesting device needs to acquire control of the bus. This can be effected, e.g., through arbitration, which generally requires the exchange of arbitration and other handshaking signals over the bus with other bus devices such as a bus arbiter and/or other potential requesting devices. Then, after eventually gaining control, the requesting device needs to assert the appropriate command line, e.g., the read line or the write line, to designate the type of operation. Additionally, for memory operations, the requesting device needs to place address information on address lines of the bus to identify the memory addresses to be accessed. Finally, the responding device needs to respond to the command, e.g., the memory needs to place data onto the bus from the addressed locations, or receive data from the bus and store them at the addressed locations.
For purposes hereof, a xe2x80x9cbus transactionxe2x80x9d can be defined as the set of all bus signals (e.g., handshaking, command, address and data) asserted after the requesting device has gained control of the bus, which are used to complete a logical task, such as a xe2x80x9creadxe2x80x9d or xe2x80x9cwritexe2x80x9d operation, performed over the bus. Devices connected to the bus transmit the signals of a bus transaction in synchrony with the bus"" clock. A xe2x80x9cbus cyclexe2x80x9d refers to the number of bus clock cycles required to perform a bus transaction. During a bus cycle, the requesting device asserts certain bus lines in accordance with the bus protocol, and the responding device scans certain bus lines to ascertain the information contained in handshaking, command, address, and data signals also in accordance with the bus protocol.
To assert a bus line, a bus device drives the bus line to a high voltage level or a low voltage level during each of one or more clock ticks during a bus cycle. The high voltage and low voltage levels correspond to digital LOGIC HIGH and LOGIC LOW states. To scan a bus line, a bus device typically detects the voltage on the line at a particular time, e.g., at a rising or falling edge of the bus clock, and determines whether the detected voltage is at a high or low level. The voltage level on certain lines determines, for example, whether the transaction is a read or a write, and, on other lines, whether the data includes a LOGIC HIGH or LOGIC LOW during the corresponding tick of the bus clock. Many bus lines are only driven for a portion (often only a small portion) of the bus cycle of a bus transaction.
Computer system architecture has advanced dramatically in performance and complexity. In terms of performance, computer systems can achieve higher clock speeds with increased bus widths and lower bus operating voltages. Increased bus clock speeds, measured usually in megahertz (MHz) can allow data to be transferred faster over the, computer system""s bus, thereby allowing computer applications to run faster. The size of a bus, known as its width, corresponds generally to the number of data lines in the bus and determines how much data can be transmitted in parallel at the same time; thus, wider busses typically transfer data faster. Lower bus operating voltages can advantageously also reduce power consumption, which is important, for example, in miniaturization of integrated circuits and in mobile computing for extending battery operating times. Unfortunately, lower operating voltages can make bus signals more susceptible to signaling errors due to lower signal-to-noise ratios and to signal distortion. Such noise and signal distortion can make it difficult for bus receivers to differentiate correctly, e.g., between data logic states, thus potentially yielding erroneous data.
Transient and other non-predictable errors in the received bus signals can also arise from other causes as well, and often have deleterious impacts on computer system performance. Such errors can arise, for example, from degradation over time of bus drivers and receivers in bus interfaces. Bus errors can also arise due to non-compatibility of add-on components such as adapter cards that are integrated into the computer system after installation at a customer site, and connected to one of the computer""s busses, e.g., through xe2x80x9cplug and playxe2x80x9d operation. Where such adapter cards malfunction or simply exhibit operating parameters unanticipated by the original computer manufacturer, transfer errors can arise on the bus to which they are connected. Such bus errors can result in lost or corrupted data or hanging of the bus protocol so as to prevent completion of a bus transaction. In extreme cases, bus errors can cause system crashes.
For diagnosing bus error conditions, it is often necessary to reproduce the errors. For example, when an error occurs during normal transfers over a system bus of a computer system, it may be necessary to drive the system bus with the same set of stimuli under the same conditions as when the error occurred in order to determine its causes. It may prove difficult to apply such stimuli and reproduce the error conditions under control of the computer""s operating system due to the complexities involved.
It would be desirable to provide a technique for testing bus operation to determine whether the bus is likely to perform adequately during actual operating conditions, and to assess the likelihood of bus transfer errors. Such testing should preferably lend itself for use in design verification and quality assurance prior to shipment from a system manufacturer, as well as in field servicing to assure bus operation has not degraded after installation at a customer""s facility. It would also be desirable to be able to run such testing in electronic devices using designed-in (i.e., embedded) testing features rather external testing apparatus that may affect testing results and are cumbersome, time-consuming and costly to use.
The invention resides in a computer system or other electronic apparatus in which bus testing logic is built into (i.e., embedded in) at least some of the devices connected to the bus to enable these devices to perform diagnostic testing of the bus and their bus interfaces. Under control of the test logic, the devices drive the bus with output voltages corresponding to a set of predetermined test bit patterns. For example, each test bit pattern is selected to cause the bus to reach a target bus utilization or saturation level. Each test bit pattern can include a plurality of digital values corresponding to drive voltages for the bus for testing that target bus utilization level over a bus cycle of a bus transaction. The bus signals produced by the devices propagate along the bus and are received by other devices. The received bus signals are resolved into a received bit pattern. The received bit pattern can be compared with the test bit pattern used to generate the bus signals in order to detect discrepancies, or a first failure resulting from the test can be captured, as described in the above-referenced patent application entitled xe2x80x9cMethod and Apparatus for Extracting First Failure and Attendant Operating Information From Computer System Devicesxe2x80x9d.
The invention permits system stress testing without the need of instruction stream generated cycles. The testing can be performed, for example, for purposes of design verification, diagnostic testing after an error has been encountered, or on a regular basis, e.g., as part of power on self-test (POST) procedures. The invention permits deterministic saturations of the bus when and where desired, e.g., for inducing various forms of error conditions, such as system-level bottlenecks and latencies, in a reproducible manner. The invention can also be used to associate a xe2x80x9cvictimxe2x80x9d bit on a bus with its xe2x80x9caggressorxe2x80x9d bit, and thus trace causes of bus error conditions.
More specifically, the invention can be embodied in a test system for testing communications over a bus connecting a number of electronic devices, e.g., components of a computer system. The test system is preferably embedded in the devices themselves rather than in apparatus external to them, and is responsive to digital control signals, e.g., conforming to JTAG, for scanning test data into and out of the devices. The test system has a stress injection module for injecting a set of stimulus patterns on the bus; an error identification module for identifying an error resulting from the set of stimulus patterns; a bus tuning module for adjusting one or more bus operating and signaling parameters by varying one or more electronic characteristics of the bus interface in response to a set of digital control signals so that testing can be performed at one or more of a number of different sets of operating and signaling parameters; a programmable control module for providing the digital control signals to the bus tuning module; and a presentation module for presenting a plurality of results of the testing. The test system can be implemented, for example, for performing HALT testing, in which the presentation module provides test results specifying a failure envelop. The test system can also be implemented, for example, for performing HASS testing, in which a bus system is tuned so as (a) to establish the normal operating envelop and recommended specifications for the device or system; (b) to maintain substantially xe2x80x9clike newxe2x80x9d operation of the bus interface of an electronic device after a period of use, and correct for parameter drift and other parameter time-dependent and use-dependent variations in signaling and operating parameters; and (c) to optimize operating and signaling parameters, e.g., for communication over a bus of an individual computer system as it is configured at a customer""s facility, and/or for particular customer applications.
In yet another aspect of the invention, a tuning system can be provided for use in tuning an electronic device such as a computer system to take into account loading and noise and other contributions of configuration changes made after shipment by a manufacturer, e.g., by downstream parties such as system integrators and end-users. The tuning system can include a probe mechanism or configuration tables for determining types of devices in the system; a parameter look-up table for providing operating and signaling parameter values for the devices in the system; and a tuner and analyzer for tuning the bus to obtain the values of the operating and signaling parameters or optimized values thereof.