Modern computer systems typically involve many components interacting with one another in a highly complex fashion. For example, a server installation may have multiple processors, configured either within individual (uniprocessor) machines, or combined into one or more multiprocessor machines. Such computer systems operate in conjunction with associated memory and disk drives for storage, video terminals and keyboards for input/output, plus interface facilities for data communications over one or more networks. The skilled person will appreciate that many additional components may also be present.
The pervasive use of such computer systems in modern day society places stringent requirements on their reliability. For example, it is especially important that the storage, manipulation and transmission of commercially significant information can all be performed correctly, without the introduction of errors. It is therefore of great importance that computer systems are correctly designed and built, and also that they continue to perform properly during their operational lifetime.
This in turn generates the need to be able to reliably test computer systems, especially during initial system design and construction. In addition it is also highly desirable to be able to test machines or components of machines in situ at customer premises, in a production environment. For example, the situation may arise where a processing error is suspected or detected in a customer system, but the source of the error is obscure. Since a typical server installation may be formed from multiple hardware and software components (each of which may potentially be supplied by a different vendor), tracing the error to its original source can be a difficult task. In such circumstances, the ability to show that at least certain components are properly functional can help to isolate the location of the fault. Indeed, even in situations where there is no particular evidence that a fault is present, it can still be desirable to be able to positively demonstrate at a customer location that a particular component (such as a newly installed device) is working properly.
One standard way of checking that hardware components are operating properly is by performing a functional test. In order to achieve this, a particular hardware unit is given some input data to process. It is then confirmed that the output from the unit represents the expected result for the given input data.
Unfortunately, the complexity of modern systems means that such functional testing can suffer from certain limitations. Thus the employment of multiprocessing cores within a single CPU, along with techniques such as register renaming, asynchronous IO, and out-of-order execution, make it difficult to determine in advance the exact processing sequence within a given hardware device. For example, microscopic timing variations from one program run to another may impact the precise execution strategy, such as which operations are performed in which particular processing core, and using which particular registers. (Of course, this will normally be transparent to programs running on the system).
As a result, it is difficult to be completely confident that all components in the system have been properly exercised when performing a functional test. For example, imagine that a particular register is potentially faulty, but that only some executions of a functional test program will actually utilise this register. In this situation, it is difficult to be certain that a positive result for a functional test is due to the fact that the register is indeed properly operational, rather than the register in question simply not being used for that particular execution.
One known alternative (or complement) to functional testing is scan testing, which can help address the above problem. Scan testing is typically applied to semiconductor devices, and is described, for example, in “Fault Diagnosis of Digital Circuits” by V Yarmolik, Wiley, 1990 (ISBN 0 471 92680 9).
FIG. 1 illustrates in simplified schematic form one stage of a generalised semiconductor processing device, in which combinational logic 15 is interposed between flip-flop 12 and flip-flop 14 (the flip-flops may also be replaced by registers and such like). At each clock signal (CLK), the contents of flip-flop 12 are output to combinational logic 15, and the contents of flip-flop 14 are output to the next stage (indicated by arrow C in FIG. 1). Flip-flop 12 then receives new contents from a preceding stage in the device (indicated by arrow A in FIG. 1), while the output of combinational logic 15 is loaded into flip-flop 14.
A complicated semiconductor device can then be regarded as formed from a large number of stages such as shown in FIG. 1. (Note that each stage comprises combinational logic plus a single flip-flop, so that, strictly speaking, FIG. 1 depicts one and a half stages, with flip-flop 12 and logic 15 forming a first stage, and flip-flop 14 then representing the input side of the next stage).
The various stages in a semiconductor device can be connected together in a highly complex manner (rather than just simply having a linear chain of one stage after another). For example, as shown in FIG. 1, combinational logic 15 may receive input for processing from more than one preceding stage (indicated schematically in FIG. 1 by arrows A and B). Similarly, the output from one stage may be split and directed to multiple other stages, including feedback loops and so on.
FIG. 2 illustrates a modification to the circuit of FIG. 1 in order to support scan testing. The components from FIG. 1 are supplemented in FIG. 2 by the addition of two multiplexers, MX 18 and MX 20, which are located in front of flip-flop 12 and flip-flop 14 respectively. Each multiplexer has two inputs, with the selection of the output from the multiplexers being controlled by a SCAN signal.
Considering the operation of multiplexer 20, when the SCAN signal is not set (i.e., has a value 0), then multiplexer 20 outputs to flip-flop 14 the signal that it receives from combinational logic 15. Thus in this mode, the presence of multiplexer 20 is, in effect, transparent, and the circuit operates in the same manner as described in relation to FIG. 1. However, when the SCAN signal is asserted (i.e., has a value 1), multiplexer 20 now outputs its second input, which is received from bypass line 25, which in turn is linked to the output of flip-flop 12. The consequence of this is that in scan mode (i.e., with the SCAN signal asserted), for each clock signal the contents of flip-flop 12 are simply shifted to flip-flop 14, as if combinational logic 15 were not present.
Multiplexer 18 operates in analogous fashion to multiplexer 20. Thus in normal mode, without the SCAN signal being asserted, its output corresponds to input A. However, in scan mode, its output now corresponds to input A′, which represents a direct connection to the output of a preceding flip-flop (not shown in FIG. 2), similar to bypass line 25.
To support full scan testing of a semiconductor device, the configuration of FIG. 2 is repeated for all relevant stages in the device. Thus a sequence of flip-flops is defined, with each flip-flop being preceded by a multiplexer. A first input to each of the multiplexers represents the normal operational input to the flip-flop, while the second input is connected by a bypass line to the preceding flip-flop in the sequence. In scan mode therefore, when the second input to the multiplexer is enforced, the sequence of flip-flops from all the different stages operates in effect as a long shift register, in which the contents of a flip-flop progress to the next flip-flop in the sequence at every clock signal.
The support of scan mode provides a mechanism both to read data into the flip-flops of a semiconductor device, and also to read data out of the flip-flops. One use of this is to verify that the device properly processes a predefined input sequence. The granularity of this testing can be as fine as one processing operation (i.e. one clock cycle).
This is illustrated schematically in FIGS. 3A–D, which each depict a sequence of flip-flops F1, F2, F3, and F4 respectively interlinked by combinational logic CL1, CL2, and CL3. Arrows A and C represent an external input and output facility respectively for the scan sequence, for example through appropriate pins on the semiconductor device. In FIG. 3A there is a binary data sequence of 110 to be input to the device (this is referred to as the input scan vector). Keeping the system in scan mode, after the first clock cycle, the 0 value is read into F1. After the second clock cycle, the 0 value is shifted into F2, while the 1 is read into F1. Next, after the third clock cycle, the 0 and 1 are shifted from F2 to F3 and from F1 to F2 respectively, while the last 1 of the input is read into F1. This leads to the position shown in FIG. 3B, in which the device has now been primed in effect to a predetermined starting state, as specified by the input scan vector.
We now suspend scan mode for a single processing cycle, which leads to the situation in FIG. 3C. This processing operation results in new data values being stored in F2, F3, and F4. The values stored are dependent on the input scan vector and the particular format of logic CL1, CL2, and CL3, and in FIG. 3C are assumed (for illustration) to be 1, 0, and 1. Re-asserting scan mode then allows the data values generated as a result of this processing to be read out in three clock cycles, as per FIG. 3D, to form an output scan vector.
It will be appreciated that analysis of the output scan vector for a given input scan vector provides a very powerful diagnostic tool for confirming that the various parts of a semiconductor device are properly operational. Consequently, scan testing is frequently employed in semiconductor fabrication plants, typically as part of the assembly line process.
Note that although in both FIG. 2 and FIGS. 3A–D, the sequence in which the flip-flops are connected up for scan testing corresponds to the normal operational flow of data through the flip-flops, this will not generally be the case. Indeed, this would actually be impossible for most devices, since as mentioned above, the normal data flow typically includes branches and such like, and so cannot be represented by a single chain or sequence of flip-flops.
As an example of this, although FIG. 2 shows bypass line 25 linking flip-flop 12 to flip-flop 14 in the same direction as data flow for standard processing, it would also be possible for the scan sequence to go in the opposite direction, i.e. a bypass line to go from the output of flip-flop 14 into multiplexer 18. In this case the bypass output of flip-flop 12 would go to some other multiplexer (not multiplexer 20), and likewise the bypass input of multiplexer 20 would come from some other flip-flop (not flip-flop 12).
In devices containing a large number of flip-flops, the bypass lines linking up the flip-flops into a single sequence for scan mode sometimes have to follow rather lengthy and indirect paths. This is primarily because priority in terms of layout is given to optimising the standard data flow of the device (i.e. through the combinational logic). Consequently, the clock rate for scan mode is usually significantly less than the normal clock rate of the device, to avoid any possible problems with signal timing on the relatively long bypass connections. Typically this reduced clock rate is achieved by providing a reduced (scan) rate clock signal to the clock line (CLK in FIG. 2).
As stated above, one of the advantages of scan testing is that it allows the flip-flops of the semiconductor device to be loaded with an arbitrary set of input data. Unfortunately however, this power and flexibility can also represent something of a hazard. An example of where such a problem can arise is illustrated in FIG. 4. (Note that for simplicity, the clock lines and scan mode circuitry have been omitted from FIG. 4).
As shown in FIG. 4, combinational logic 415 drives two flip-flops 412 and 414, each of which is respectively connected in turn to an AND gate 430, 432, and from there as the control input to a tri-state buffer (TSB 90 and 92). The outputs of the two tri-state buffers 90 and 92 are then connected to bus 450.
A tri-state buffer (or driver), as its name suggests, has three possible outputs. If the control input is 1, then the buffer is in an ON state, and simply outputs 0 or 1, mirroring its data input (shown schematically in FIG. 4 as A1 and A2 for TSB 90 and 92 respectively). However, if the control input is a 0, then the gate is in an OFF state and in effect disconnected from the circuit. Consequently, it can neither drive nor load another device.
Tri-state buffers are typically high output devices and can be used to drive a large number of gates. They are frequently employed in computer systems, for example for connecting registers to a common bus. However, the high power output can potentially cause problems. For example, in the circuit of FIG. 4, having both TSB 90 and TSB 92 simultaneously in an ON state may lead to excess current on bus 450, and potentially damage either the bus or devices attached to the bus. To avoid this, the combinational logic 415 of FIG. 4 can be designed to ensure that only one of the two TSBs 90 and 92 is ever in an ON state at any given time. As a further safeguard, an ENABLE signal is supplied to AND gates 430 and 432. When the ENABLE signal is absent, the outputs of the two AND gates are both held low, and so the two TSBs are both kept in a safe (OFF) state.
Considering now the application of a scan test to the circuitry of FIG. 4, the device can be maintained in safe mode, with the ENABLE signal absent, while initially loading the scan vector (corresponding to the transition from FIGS. 3A to 3B). The same also applies during the read out of the output scan vector (corresponding to the transition from FIG. 3C to FIG. 3D). However, the device must be taken out of safe mode, and the ENABLE signal asserted, in order to allow one or more normal processing steps to occur (corresponding to the transition from FIG. 3B to FIG. 3C). Otherwise, it is simply not possible to properly test the operation of the device.
Unfortunately, there is a danger that a particular scan vector may load 1 into both flip-flop 412 and also into flip-flop 414, since the combinational logic 415 that would normally ensure that this cannot happen is being bypassed in a scan test. This in turn may damage the chip, typically through causing excess current flow and the resultant overheating.
In a normal semiconductor fabrication facility it is relatively easy to ensure that the above problem does not arise, since the input scan vector applied on a given line for a given chip can be carefully controlled. However, as explained above, it is sometimes desirable for a field engineer or service operative to be able to perform testing at a customer location. This may include the use of scan testing via some portable electronic equipment to supply the input scan vector, and then to read and analyse the output scan vector.
In such circumstances, the risk of an inappropriate scan vector being used for testing a device is much higher. Thus one danger is that the scan vector becomes corrupted while being stored on the portable electronic equipment, or while transmitted to or from such equipment. A further possibility is that a typical engineer may have to deal with many different components in the field, including different versions or refinements of the same underlying product. Each such component may potentially have its own input scan vector or vectors stored on the portable electronic equipment, and there may be confusion about which particular one to use in a given situation.
If this leads to the wrong input scan vector being used to test a particular component (or possibly the right input scan vector being applied to the wrong component), then as mentioned above, this may result in damage to the device being tested. If such damage is readily noticeable, then the component concerned can be removed or disconnected, although the engineer may well not have a replacement part immediately to hand. A more insidious possibility is that the component suffers less apparent damage, which may cause problems with (intermittent) errors or degraded reliability in the future. In any event, the supplier or service organisation is likely to suffer poor customer satisfaction, and increased expense.
In addition to the risk of accidental damage, there is also a danger that someone may deliberately try to disable or at least degrade a device. One way of attempting such a malicious attack is potentially by inputting an inappropriate scan vector.