1. Field of the Invention
This invention relates generally to the testing of digital signal processing units and, more particularly, streaming trace test and debug techniques for processing units having multi-chip with multiple cores.
2. Background of the Invention
Data processing systems, especially digital signal processing unit are becoming more complex. Specifically, some cards used in the telecommunication industry have 100 devices. The card can include multiple chips, each chip potentially having multiple cores or processors. Already, 1000 cores or processors on a single card are being contemplated by some users. The task of testing and debugging of a processing array is formidable.
In the past, one approach to the testing of a processing array has been known as streaming. Referring to FIG. 1, an example of a card 10 is shown. The card 10 includes N chips 11. Each of the chips 11 has M cores fabricated thereon. In this example, each of the cores 15 is designated by the same numeral. (As will be clear, the cores 15 can be alike or can be different. Similarly, the fact that each chip has N cores 15 fabricated thereon permits easy computations for comparison.) Each core 15 includes a instruction register 151. Each instruction register 151 is 38 bits in length and is fabricated as a shift register. All of the instruction registers 151 are coupled in series. The instruction registers 151 provide commands to the core 15 of which the instruction register 15 is a unit. The instruction registers have test and diagnostic signals stored therein. The commands and the test and diagnostic signals are, respectively, shifted into instructions registers 151 of the cores 15 and out of the instruction registers 151 of the cores 15. In order to shift a complete set of commands in or a set of data out of the chain of shift registers, N×M×38 clock cycles are required. The chain of instruction registers is coupled to the test and debug unit 5. The test and debug unit executes the instruction to perform meaningful test and debug operations.
Referring to FIG. 2A, the instruction register 20 of the prior art is illustrated. In the preferred embodiment, the instruction register is 38-bit shift register. The instruction register 20 is coupled in series to the other instruction registers in the scan. The instruction register 20 provides commands to a core 15 and receives test data from the core 15. The commands are shifted by the test and debug unit 5 into the instruction register 20 through intervening instruction registers 20 in the scan chain and the test data is shifted out of the instruction register to the test and debug unit 5 through the intervening instruction registers, i.e., as shown in FIG. 1.
Referring to FIG. 2B, an instruction register unit 25, according to the present invention is illustrated. The instruction register unit 25 replaces the instruction register 20 in each core 15. The instruction register unit 25 includes a 37-bit shift register 26 having an input terminal to which scan chain signals are applied. The output terminal of shift register 26 is applied to a first input terminal of multiplexer 27. The input terminal of shit register 26 is coupled to a second input terminal of multiplexer 27. The output terminal of the multiplexer 27 is coupled to an input terminal of a 1-bit shift register 28. The output terminal of shift register 28 is coupled to the next instruction register unit 25 in the scan chain. A control signal applied to the multiplexer 27 determines which multiplexer 27 input terminal is applied to the input of the shift register 28. In the first state, the multiplexer 27 couples the shift register 26 with the shift register 28, thereby creating a 38-bit shift register, 26 and 28, in the scan chain. In the second state, the 37 bit shift register is by-passed and a 1-bit shift register 28 is in the scan chain.
Referring to FIG. 3, the block diagram of FIG. 1 is replicated with the exception that the instruction registers 20 of all the devices not being tested, have been placed in bypass mode. Instruction register unit 25′ is in data streaming mode and acts as a 38-bit shift register. This configuration permits the extraction of test data from one core 15 without the delay of non-selected instruction registers 25. The time to extract the test data from the scan chain is (N×M)+37 clock cycles. This time is to be compared to N×M×38 when the scan chain is implemented with prior art instruction registers.
Referring to FIG. 4, a block diagram of the components interacting with the instruction register unit 25 in the core is shown. Commands are entered by the test and debug unit 5 on the scan chain and are stored in the 37-bit shift register 26 and the one bit register 28. The combination of the two shift registers 26 and 28 is, for purposes of entering commands in the core 15, the instruction register 20 of the prior art and receives data signals. The logic signals in the two shift registers are entered in a core test control unit 151. The core test control unit 151 processes the logic signals according to the state of the core ctest control unit. The state of the core test control unit 151 is determined by TEST MODE SELECT (TMS) signals and TEST CLOCK (TCK) signals. Typically the command requires a response in the form of test data, i.e., data to be analyzed by the test and debug unit 5.
FIG. 5 shows a normal scan which loads a single instruction into the shift register and then changes the scan state to load the instruction into the core test control unit. The scan state is then changed again to execute the instruction. The top waveform, labeled scan state, illustrates the states of the core test control unit 151. These states are the result of a sequence of test mode select (TMS) signals that result in a state transition. The middle waveform, labeled shift register, describes the contents of the shift register unit 25. The bottom waveform, labeled instruction register, describes the contents of instruction register Since all devices in the scan chain respond to instruction scans, every scan takes M×N×38 clock cycles. The states are referred to by the JTAG labels and the apparatus is implemented to be compatible with the JTAG standards.
A need has therefore been felt for apparatus and an associated method having the feature of increasing the efficiency of the transfer of instructions and test data to and from multiple processors. It would be a further feature of the apparatus and associated method to increase the efficiency of instructions and data transfer in the data streaming test environment. It is yet another feature of the apparatus and associated method to provide an improved technique for deselecting a first processor and selecting a second processor for implementing a steaming data mode of operation. It is a more particular feature of the apparatus and associated method to provide for an improved boundary scan. It is another particular feature of the present invention to provide a boundary scan that is compatible with JTAG (Joint Test Action Group) protocols.