1. Field of the Invention
The present invention relates to the field of data processing, and particular data processing systems having error detection mechanisms.
2. Description of the Prior Art
Dynamic voltage and frequency scaling can be used to reduce the overall energy consumption of a computer system, particularly for systems with high variation in processing requirements. Dynamic voltage and frequency scaling can either be used to push the operating conditions of a circuit beyond the nominal operating conditions assumed during design time in order to achieve improved clock frequency, or to reduce energy consumption at times when the full capabilities of the hardware are not required. However, if the voltage or frequency is scaled too aggressively, then the likelihood of errors occurring may increase. A critical issue for dynamic voltage and frequency scaling enabled computer systems is determining the safe operating voltage or frequency at which maximum execution efficiency is achieved, whilst still guaranteeing correct operation of all components. It is often helpful to be able to detect errors in operation that may occur.
One technique for detecting and recovering from errors is the use of “Razor” latches as described in WO-A-2004/084702. FIG. 1A of the accompanying drawings shows a functional element 10 of a data processing system in which the Razor technique has been used. It will be appreciated that the data processing system would typically include a number of such functional elements. The functional element 10 includes processing circuitry 12, an input latch 14, and an output latch 16. The processing circuitry 12 performs processing operations on input values from the input latch 14 and passes output values to the output latch 16. The operations performed by the processing circuitry 12 could include adding, shifting or logic operations, for example. In this example of the Razor technique, the functional element 10 also includes error detection/recovery circuitry 18 coupled to the output latch 16 and a rollback multiplexer 20 positioned between the processing circuitry 12 and the output latch 16. The error detection/recovery circuitry 18 detects changes in the output value of the functional element between a first sampling time and a second sampling time, the sampling window being shorter than one clock cycle. One way in which changes can be detected is by arranging for the output latch 16 to produce an error signal if a change occurs between the two sampling times, and this error signal is then detected and processed by the error detection/recovery circuitry 18. A change in the output value between the first and second sampling times can indicate that the processing circuitry 12 had not yet completed its operations at the time of the first sampling time (an event that will become more likely if the voltage or frequency is scaled too aggressively), and so the output value could be incorrect. If other functional elements in the system have used the output value at the first sampling time for further processing operations, then errors may arise in the operation of these functional elements due to using the incorrect value. Therefore, the error detection/recovery circuitry 18 can perform an error recovery operation so that the processing of at least some functional elements of the system is halted, and a prior state of the system is restored. This can be done by the error detection/recovery circuitry 18 passing a rollback value of the output value to the rollback multiplexer 20 and controlling the rollback multiplexer 20 to select the rollback value and pass this value to the output latch 16. The rollback value could be the value of the output value at the second sampling time, or could be a ‘safe’ value that is known to work correctly. Accordingly, errors caused by the processing circuitry 12 operating beyond its normal operating conditions can be detected and the system can recover from such errors. The operating voltage and/or the clock frequency can be adjusted depending on the number of errors detected in a given period.
FIG. 1B of the accompanying drawings shows another functional element 20, in which the Razor technique has also been used. In this case, the additional circuitry 18, 20 has been applied to the input latch 14, such that changes in the input value over a given period of time are detected and errors flagged by the error detection/recovery circuitry 18. The operation of the Razor circuitry 18, 20 is as discussed above. A change in the input value between the first and second sampling times could indicate, for example, that a functional element preceding the functional element 20 had not completed its operation at the time that the functional element 20 had received its input value. In the following description, the examples discussed will generally have an output latch that is provided with error detection/recovery circuitry 18 and this latch will be referred to as a “Razor latch”. However, the skilled person will appreciate that the input latch could also be a Razor latch.
The use of Razor latches can cause a problem if the time taken for the processing circuitry 12 to produce its output value is particularly short. In the system of FIG. 1A, for example, in one clock cycle the processing circuitry 12 processes an input value from the input latch 14 and the resulting output value is latched in the output latch 16. The output value of the processing circuitry 12 is then sampled at the first sampling time. In a subsequent clock cycle, the processing circuitry 12 is provided with a second input value from the input latch 14 and produces a second output value. If the time taken to produce the second processing output value is shorter than the interval between the first sampling time and the second sampling time at the output latch 16, then the error detection/recovery circuitry 18 can detect a change in the output value of the processing circuitry 12, even if the value at the first sampling time was correct. This will result in a false positive detection of an error. A possible reason for the processing circuitry 12 being so quick to produce its output value could be that the particular path used to calculate the value uses a fewer number of gates than other paths through the processing circuitry 12. For this reason, this type of false detection of an error is known as a short path error. Short path errors can trigger an error recovery operation unnecessarily, which will be costly in terms of time and power required for the processing operations. Also, short path errors could cause further errors at later stages of processing if the value at the second sampling time (i.e. the value produced by the processing circuitry 12 one cycle later than the actual value) is used by subsequent functional elements. It is therefore desirable to reduce the number of short path errors, so as to decrease the number of detected changes in output or input values and increase the likelihood that a detected change represents an actual error in operation.
FIG. 2 of the accompanying drawings shows one way of preventing short path errors from occurring. In this example, the operation performed by the functional element 30 is an ADD operation. The functional element 30 includes circuitry 32, 34, 36, 38 for performing the ADD operation. In this example, the Razor latch is the output latch 16, although it could also be the input latch 14. For simplicity, the rollback multiplexer 20 has been omitted in FIG. 2. FIG. 2 illustrates several paths linking point A, where the input value is fed into the functional element from the input latch 14, and point B, where the output value is latched. To address the short path problem, extra buffers 40 are inserted into the shorter paths through the functional element 30 so that the time taken for the shortest path in the design to complete is longer than the interval between the first and second sampling times of the Razor latch. However, adding extra buffers is costly in terms of area and energy consumption, thereby impacting on any gains achieved through dynamic voltage and frequency scaling.
It is an object of the present invention to address the short path problem without unduly increasing the circuit area and/or energy consumption of the system.