It has become very difficult to diagnose failures in and to measure the performance of state-of-the-art microprocessors. This is because modern microprocessors not only run at very high clock speeds, but many of them also execute instructions in parallel, out of program order and speculatively. Moreover, visibility of the microprocessor's inner state has become increasingly limited due to the complexity of the microprocessors and to practical constraints on the number of external pads that can be provided on the chip package.
In the past, the traditional failure diagnosis and performance measurement tools have been external logic analyzers and in-circuit emulators. Logic analyzers are capable of monitoring signals on the chip pads and other externally-accessible system signals, capturing the state of these signals and generating triggers based on their states. Unfortunately, logic analyzers must rely solely on externally-accessible signals to accomplish this, not on signals that are internal to the chip itself. In-circuit emulators, on the other hand, are used to mimic the functional characteristics of a new microprocessor in a system environment and to add visibility to certain data values within the microprocessor. But such devices only emulate the functionality of the microprocessor. By their very nature, they cannot give an accurate representation of the performance characteristics of an actual silicon device. Therefore, they are primarily useful only for developing and debugging system software.
By way of background, U.S. Pat. No. 5,488,688, issued Jan. 30, 1996, to David R. Gonzales, et al., discloses a digital signal processor with a FIFO buffer configured on-chip to monitor a fixed set of internal bus signals. The FIFO buffer is coupled to a debug controller that is capable of operating in first and second modes. In the first mode, the CPU may be halted on the occurrence of one of four specifically-enumerated event conditions: after an external request; after a hardware breakpoint (occurrence of specific data or address values); after a software breakpoint (execution of a specific CPU instruction); or after a specified number of instructions have been executed. In the second mode, only the FIFO buffer is halted on the occurrence of an event condition. In either mode, the user may examine the contents of the FIFO buffer after a halt to determine what flow of software instructions were executed just prior to the event occurrence. An off-chip serial interface is used to communicate with the debug controller and to examine the contents of the FIFO buffer. The serial interface complies with the well-known Institute of Electrical and Electronics Engineers (IEEE) Standard 1149.1, "Test Access Port and Boundary Scan Architecture," also known as the Joint Test Action Group (JTAG) standard. A serial port conforming to this standard will hereinafter be referred to as a test access port or "TAP."
By way of further background, U.S. Pat. No. 5,418,452, issued May 23, 1995, to Norman C. Pyle, discloses an apparatus for testing integrated circuits using time division multiplexing. In order to reduce the number of pins necessary to communicate the signals from on-chip test nodes to an off-chip logic analyzer, Pyle employs a multiplexer on the chip under test and a demultiplexer in the logic analyzer. Each input of the multiplexer is coupled to an on-chip test node, and the multiplexer select lines are driven by counter outputs. By applying an identical set of counter outputs to the select lines of the demultiplexer, Pyle implements a time-division-multiplexed serial communication line between the chip under test and the logic analyzer. Signals from the numerous test nodes in the chip under test are coupled to the communication line in different time slices. The signals are then reconstructed by the demultiplexer in the logic analyzer.
By way of still further background, U.S. Pat. No. 5,473,754, issued Dec. 5, 1995 to Dale E. Folwell, et al., discloses a scheme for enabling an off-chip device to monitor the state of an on-chip 24-bit program counter in real time using an 8-bit port on the chip under test. Folwell assumes that discontinuities in the program counter will occur only in a limited number of situations. He then captures the contents of the program address bus only when one of these conditions occurs, and then sends those contents off chip via the 8-bit port. Because the contents of the program address bus are not captured with every increment of the counter, the volume of data that must be output via the 8-bit port is reduced.
By way of still further background, U.S. Pat. No. 5,317,711, issued May 31, 1994 to Philip A. Bourekas, et al., discloses a scheme for providing off-chip test access to the signals of an on-chip bus that connects an on-chip cache to an on-chip CPU. The signals of the bus are brought out to the chip's external address/data bus when the external address/data bus is not being used for transactions with main memory or peripherals. To accomplish this, reserved pins on the microprocessor are used to control a multiplexer. Depending on the state of the multiplexer's select lines, either the microprocessor's main memory read/write and data lines, or the address that is being provided to the internal cache memory, is coupled to the chip's external address/data bus.
By way of still further background, U.S. Pat. No. 4,910,417, issued Mar. 20, 1990 to Abbas El Gamal, et al., discloses an improved user-programmable interconnect architecture for logic arrays. Specifically, Gamal uses existing row-column selecting logic in combination with an output multiplexer for coupling user-selectable internal circuit nodes to a particular external chip pad for testing. Additionally, latches are provided for each chip input pin so that, with the assertion of an external signal, all chip inputs may be frozen. Then, the row-column select circuitry and output multiplexer may be used to probe nodes within the chip using the latched inputs as stimulus.
While the above structures are useful for the particular purposes for which they are proposed, they fall far short of teaching or suggesting a comprehensive structure for debugging and monitoring the performance of a state-of-the-art microprocessor or microprocessor system.
What is needed is a comprehensive system and method for enabling microprocessor and system designers to debug state-of-the-art microprocessors and systems more easily, and to do so in a highly flexible and sophisticated manner. Such a system and method should enable tests to be performed using the actual hardware of the device being evaluated, under actual system environment conditions, and while running the device at full speed. Such a system and method should enable programmers to define a wide variety of possible kinds of events that may occur within the microprocessor or system, and to generate a variety of triggers based on those user-definable events. Moreover, the programmer should be able to define a variety of actions that might automatically be taken within the microprocessor or system upon the generation of one of the triggers.
One particularly troublesome problem that has stood in the way of developing such a system has been the complexity required of any state machine used to control it. For example, it would be desirable for the programmer to be able to configure the state machine to look for AND terms, OR terms, or a combination of AND and OR terms among the state machine input signals, and to change states and/or take other actions based on a detection of those terms. Unfortunately, providing this capability in the past has required the use of highly complex state machine input comparison circuitry. If a bit-wise comparator were used, providing AND and OR term capability required both multi-bit OR circuitry and multi-bit AND circuitry at the output of the comparison circuitry. Moreover, the outputs of the multi-bit OR and AND circuitries had to be multiplexed depending on whether the programmer desired OR or AND terms. Providing masking capability for state machine inputs in such an implementation was even more cumbersome because the mask bits had to be ORed individually with the components of the AND term and ANDed individually with the components of the OR term. The present invention eliminates these difficulties and requires fewer components to implement, thus enabling a more sophisticated state machine to be built using a relatively small chip area.