This invention relates generally to microprocessor testing, and more particularly to a system and method for on-chip debug support and performance monitoring for microprocessors and microprocessor systems.
It has become very difficult to diagnose failures in and to measure the performance of state-of-the-art microprocessors. This is because modern microprocessors not only run at very high clock speeds, but many of them also execute instructions in parallel, out of program order and speculatively. Moreover, visibility of the microprocessor""s inner state has become increasingly limited due to the complexity of the microprocessors and to practical constraints on the number of external pads that can be provided on the chip package.
In the past, the traditional failure diagnosis and performance measurement tools have been external logic analyzers and in-circuit emulators. Logic analyzers are capable of monitoring signals on the chip pads and other externally-accessible system signals, capturing the state of these signals and generating triggers based on their states. Unfortunately, logic analyzers must rely solely on externally-accessible signals to accomplish this, not on signals that are internal to the chip itself. In-circuit emulators, on the other hand, are used to mimic the functional characteristics of a new microprocessor in a system environment and to add visibility to certain data values within the microprocessor. But such devices only emulate the functionality of the microprocessor. By their very nature, they cannot give an accurate representation of the performance characteristics of an actual silicon device. Therefore, they are primarily useful only for developing and debugging system software.
By way of background, U.S. Pat. No 5,488,688, issued Jan. 30, 1996, to David R. Gonzales, et al., discloses a digital signal processor with a FIFO buffer configured on-chip to monitor a fixed set of internal bus signals. The FIFO buffer is coupled to a debug controller that is capable of operating in first and second modes. In the first mode, the CPU may be halted on the occurrence of one of four specifically-enumerated event conditions: after an external request; after a hardware breakpoint (occurrence of specific data or address values); after a software breakpoint (execution of a specific CPU instruction); or after a specified number of instructions have beeri executed. In the second mode, only the FIFO buffer is halted on the occurrence of an event condition. In either mode, the user may examine the contents of the FIFO buffer after a halt to determine what flow of software instructions were executed just prior to the event occurrence. An off-chip serial interface is used to communicate with the debug controller and to examine the contents of the FIFO buffer. The serial interface complies with the well-known Institute of Electrical and Electronics Engineers (IEEE) Standard 1149.1, xe2x80x9cTest Access Port and Boundary Scan Architecture,xe2x80x9d also known as the Joint Test Action Group (JTAG) standard. A serial port conforming to this standard will hereinafter be referred to as a test access port or xe2x80x9cTAP.xe2x80x9d
By way of further background, U.S. Pat. No. 5,418,452, issued May 23,1995, to Norman C. Pyle, discloses an apparatus for testing integrated circuits using time division multiplexing. In order to reduce the number of pins necessary to communicate the signals from on-chip test nodes to an off-chip logic analyzer, Pyle employs a multiplexer on the chip under test and a demultiplexer in the logic analyzer. Each input of the multiplexer is coupled to an on-chip test node, and the multiplexer select lines are driven by counter outputs. By applying an identical set of counter outputs to the select lines of the demultiplexer, Pyle implements a time-division-multiplexed serial communication line between the chip under test and the logic analyzer. Signals from the numerous test nodes in the chip under test are coupled to the communication line in different time slices. The signals are then reconstructed by the demultiplexer in the logic analyzer.
By way of still further background, U.S. Pat. No. 5,473,754, issued Dec. 5, 1995 to Dale E. Folwell, et al., discloses a scheme for enabling an off-chip device to monitor the state of an on-chip 24-bit program counter in real time using an 8-bit port on the chip under test. Folwell assumes that discontinuities in the program counter will occur only in a limited number of situations. He then captures the contents of the program address bus only when one of these conditions occurs, and then sends those contents off chip via the 8-bit port. Because the contents of the program address bus are not captured with every increment of the counter, the volume of data that must be output via the 8-bit port is reduced.
By way of still further background, U.S. Pat. No. 5,317,711, issued May 31, 1994 to Philip A. Bourekas, et al., discloses a scheme for providing off-chip test access to the signals of an on-chip bus that connects an on-chip cache to an on-chip CPU. The signals of the bus are brought out to the chip""s external address/data bus when the external address/data bus is not being used for transactions with main memory or peripherals. To accomplish this, reserved pins on the microprocessor are used to control a multiplexer. Depending on the state of the multiplexer""s select lines, either the microprocessor""s main memory read/write and data lines, or the address that is being provided to the internal cache memory, is coupled to the chip""s external address/data bus.
By way of still further background, U.S. Pat. No. 4,910,417, issued Mar. 20, 1990 to Abbas El Gamal, et al., discloses an improved user-programmable interconnect architecture for logic arrays. Specifically, Gamal uses existing row-column selecting logic in combination with an output multiplexer for coupling user-selectable internal circuit nodes to a particular external chip pad for testing. Additionally, latches are provided for each chip input pin so that, with the assertion of an external signal, all chip inputs may be frozen. Then, the row-column select circuitry and output multiplexer may be used to probe nodes within the chip using the latched inputs as stimulus.
While the above structures are useful for the particular purposes for which they are proposed, they fall far short of teaching or suggesting a comprehensive structure for debugging and monitoring the performance of a state-of-the-art microprocessor or microprocessor system.
Adequate debugging and monitoring of a microprocessor or microprocessor system is further exacerbated by the recent trend to place memory devices of the microprocessor system on-chip with the microprocessor and other chip circuitry. As IC fabrication technology has evolved to the sub-micron level, as evidenced by devices fabricated using a 0.25-micron or even smaller fabrication process, it has become possible to place large memory arrays, such as random access memories (RAMs), static random access memories (SRAMs), and cache RAMs, entirely on-chip with the microprocessor and other circuitry. On-chip memory arrays provide the advantage of direct communication with the CPU without the need for I/Os to external pins.
In spite of the advantages of placing memory arrays on-chip, there are concerns with how to accomplish testing of on-chip memory arrays. On-chip memory arrays, which may account for a large portion,.even a majority, of the total die area of a chip, are much harder to control and observe than their discrete predecessors, making it difficult to use traditional external tester equipment and hardware to test, screen, characterize, and monitor on-chip arrays. Visibility into how on-chip memory arrays function is severely limited by the placement of the array-chip interface, such as the interface between a memory array and a CPU core of a microprocessor chip, for instance, on-chip.
Prior methodologies for testing on-chip memory arrays include both Built-In-Self-Test (BIST) and Direct Access Testing (DAT). DAT involves porting the memory array I/Os off the chip in order to engage in direct testing of the array, in a manner similar to testing a discrete memory array device. An example of a prior art DAT implementation 10 is shown in FIG. 1. In this figure, the chip is shown as a microprocessor 20 having on-chip memory array 22, multiplexers (mux) 24 and 28, and central processing unit (CPU) core 26. Data is provided to memory array 22 from either high-performance tester hardware that is external to the microprocessor and capable of providing address and data pattern sequences 56 at high speed and large bandwidth for at-speed testing or directly from the CPU core 26. Datapath control of the memory array 22 is therefore provided by multiplexer 24 that provides information 36 to memory array 22 upon selecting information 38 from CPU core 26 or information 42 from the bus interface 30, 50 or 32, 52. Multiplexers 24 and 28 and bus interface 34, and portions of 40, 42 represent special DAT hardware and signals in the memory array datapath. As shown in FIG. 1, DAT I/O interface is provided through bus interface 32 and shared DAT/CPU high-speed chip I/O 52 or, optionally, as indicated by the dashed lines, through DAT I/O interface 34 comprised of bus interface 30 and dedicated DAT high-speed chip I/O 50. Multiplexer 28 chooses information from either bus 40 or bus 46 to present to bus interface 32 via bus 48, as shown. Shared DAT/CPU I/O bus 52 is a microprocessor system bus, such as a cache system bus, that is already available. Data from memory array 22 is provided to CPU core 26 and to either bus interface 30 or 32 via cache address and data busses 40, as shown.
The DAT solution provides the power and flexibility of today""s testing equipment but requires more expensive and complex external test support, high-speed I/O for at-speed testing, and additional circuitry and busses than would otherwise be available on the chip in order to properly test and characterize the arrays. For instance, a large memory array that resides on a microprocessor chip, such as a large double-or quad-word accessible cache, would require a large number of external I/O pins or pads of the chip. Additionally, DAT methodologies typically rely upon additional core VLSI datapaths and are thus more dependent on the non-array VLSI.
DAT is also severely challenged by today""s high-speed on-chip memory arrays, with frequencies of up to 1 GHz, which typically are much faster than currently available tester technology. A large amount of data must often be presented to the cache of a microprocessor at high speeds, for instance, in order to achieve acceptable fault coverage of the memory. Due to this growing speed discrepancy between on-chip memory arrays and currently available external tester equipment used to test them, the DAT methodology is often no longer capable of testing on-chip memory arrays at speed; it is often necessary to test each array on the chip sequentially or with common test vectors, such as array address and data pattern sequences. Moreover, even as external test equipment can be expected to become faster, memory arrays will themselves also become faster so that this speed discrepancy will continue to be a problem in the future.
BIST differs from DAT in that it essentially integrates the test vector generation provided by the external tester equipment of DAT on-chip. Referring to FIG. 2, a BIST implementation is illustrated. BIST moves the test vector generation on-chip microprocessor 20 inside BIST block 64 so that less hardware is required of the BIST implementation than a DAT implementation. Multiplexer 62, BIST block 64, portions of bus 40, and associated address/data bus 68 represent special BIST hardware in the memory datapath. Previous BIST solutions predominantly hard-wired the test vector generation within BIST block 64 to render only limited, fixed test functionality. In order to provide independent, although restricted, access to just the memory array(s) 22, as opposed to accessing the entire chip 20, BIST operation and extraction of test results are typically accomplished through IEEE Standard 1149.1 Joint Test Action Group (JTAG) boundary scan Test Access Port (TAP).
What is needed is a comprehensive system and method for enabling microprocessor and system designers to debug state-of-the-art microprocessors and systems more easily, and to do so in a highly flexible and sophisticated manner. Such a system and method should enable tests to be performed using the actual hardware of the device being evaluated, under actual system environment conditions, and while running the device at full speed. Such a system and method should enable programmers to define a wide variety of possible kinds of events that may occur within the microprocessor or system, and to generate a variety of triggers based on those user-definable events. Moreover, the programmer should be able to define a variety of actions that might automatically be taken within the microprocessor or system upon the generation of one of the triggers. In addition, such a system and method should provide the programmer with enhanced access to signals and states that are internal to the microprocessor chip, and should provide this access in a flexible, user-configurable manner.
Additionally, the prior art lacks the ability to directly access, test, and monitor on-chip memory arrays of microprocessor systems in a flexible, thorough manner. Flexibility in test vector generation is particularly essential for testing large, on-chip arrays because it is often impossible to accurately predict critical sensitivities of such arrays. Whether an array passes or fails a given test is dependent upon many interrelated factors, including the voltage to which the array is subjected, the testing temperature, the fabrication process of the array, and the frequency or frequencies at which the array is tested. Large, high-density memory arrays are also notoriously susceptible to various electrical and coupling effects, such as cell-to-cell coupling, bitline coupling, and ground bounce, that may cause logic and timing failures of the array. Moreover, the large number of sub-micron transistors of large, high-density arrays have known possible manufacturing defects, such as particle contamination, missing p-wells, and open/short conditions, for which the arrays must be tested.
Therefore, according to the present invention, a method and structure facilitates the debugging and test coverage capabilities of a microprocessor. A microprocessor having memory arrays, a debug block, and one or more built-inself-test (BIST) engines is disclosed. The debug block is capable of driving control information out onto a state machine output bus in response to an event and the control information can be selectively used to control signature analysis and/or recording elements of the microprocessor, such as multiple-input-shift-registers and first-in-first-out devices, that facilitate in the monitoring and debugging of the microprocessor. The signature and recording elements may or may not be contained within the one or more BIST engines. The control information interface between the BIST engine(s) and the debug block can greatly facilitate debugging and test coverage of the microprocessor. Alternately, the signature analysis elements and/or recording elements need not necessarily be used in conjunction with the memory arrays and the BIST engine(s) described above. These elements may be used to monitor and test any set of signals of interest occurring within the microprocessor as will be described.
The debug block features user-configurable diagnostic hardware contained on-chip with the microprocessor for the purpose of debugging and monitoring the performance of the microprocessor. A programmable state machine is coupled to on-chip and off-chip input sources. The state machine may be programmed to look for signal patterns presented by the input sources, and to respond to the occurrence of a defined pattern or sequence of defined pattems by driving certain control information onto the state machine output bus. On-chip devices coupled to the output bus take user-definable actions as dictated by the bus. The input sources include user-configurable comparators located within the functional blocks of the microprocessor. The comparators are coupled to storage elements within the microprocessor, and are configured to monitor nodes to determine whether the state of the nodes matches the data contained in the storage elements. By changing data in the storage elements, the programmer may change the information against which the state of the nodes is compared and also the method by which the comparison is made. The output devices include counters. Counter outputs may be used as state machine inputs, so one event may be defined as a function of a different event having occurred a certain number of times or an event may be specified as occurring a specified number of cycles subsequent to another event. The output devices also include circuitry for generating internal and external triggers. User-configurable multiplexer circuitry may be used to route user-selectable signals from within the microprocessor to the chip""s output pads, and to select various internal signals to be used as state machine inputs.
Each BIST engine is coupled to the one or more memory arrays and has a main control block, one or more address generation blocks, and one or more data generation blocks. The main control block controls operation of the address generation blocks and the data generation blocks through its main control register. The address generation blocks operate to selectively provide address information to the on-chip memory arrays and include an address local control block having an address control. register and one or more address-data blocks. The address-data blocks have address-data registers that are controlled by the address control register to provide address information to the on-chip memory arrays from either the one or more address generation blocks, or from other on-chip non-BIST engine, non-memory array circuitry of the integrated circuit device such as a CPU, in accordance with instructions programmed into the address control register. The address control register may also be programmed to control the address-data registers to monitor address information provided to the on-chip memory arrays from either the one or more address generation blocks or from other on-chip non-BIST engine circuitry like the CPU.
Similarly, the data generation blocks operate to selectively provide and receive data information to and from the one or more on-chip memory arrays and include a data local control block having a data control register and one or more dataxe2x80x94data blocks. The dataxe2x80x94data blocks have dataxe2x80x94data registers controlled by the data control register to provide or monitor data information from either the one or more data generation block or on-chip non-BIST engine circuitry of the integrated circuit device, such as the CPU, to the on-chip memory arrays in accordance with instructions programmed into the data control register and to receive information from the memory arrays. The main control register of the main control block coordinates when the address generation blocks and the data generation blocks execute their programming and can also ensure that the BIST engine operates synchronously with the non-BIST engine circuitry of the integrated circuit chip.
The address generation blocks and the data generation blocks of the BIST engine are programmed to provide address and data information to the on-chip memory arrays and to receive data information from the memory arrays in order to facilitate monitoring of the memory arrays. Programming the address and data generation blocks is accomplished by programming the appropriate control registers of the local address and data local control blocks to control the address and data generation blocks in the manner desired. The main control block is then programmed to coordinate execution by the address and data generation blocks of their programming; the main control block also ensures that the BIST engine operates synchronously with the CPU of the chip.