1. Field of the Invention
The present invention relates to digital logic systems employing combinatorial and sequential logic, such as microcontroller systems. More particularly, the present invention relates to a clock circuitry architecture that employs tunable variable delays for use in such systems.
2. The Prior Art
To improve the effectiveness of micro-controller systems with respect to the volume of information to be processed, co-processor modules are connected the central processing unit through the internal system bus. The co-processor modules can be accessed for configuration, sending commands/data, and can be interrogated for status.
All of the modules of a micro-controller are often clocked by the same clock signal, causing a peak in the power consumption for each active edge of the clock signal due to the switching of sequential cells and to the combinational logic networks driven by the sequential cells.
Referring now to FIG. 1, a block diagram shows an example of a system comprising a crypto-processor connected in a micro-controller environment. The system includes a microprocessor 10 capable of executing a set of instructions that can be stored outside the integrated circuit in a memory device which is controlled by an external bus interface (EBI) 12 or located in a ROM or embedded flash acting as an on-chip memory 14. An address decoder module 16 is used to select one module from among all possible modules/peripherals connected in parallel of the same system bus. The system bus includes an address bus 18, a write data bus 20, a read data bus 22 and a read/write control signal 24. The data on the read data bus 22 is generated by a data multiplexer 26 collecting data from EBI interface 12, on-chip memory 14, crypto-processor 28, UART 30, and CAN controller 32. Clock terminal 34 supplies clock signals to all the components and reset terminal 36 may be used to initialize all the components as is known in the art. Similarly, power supply terminals VDD 38 and GND 40 supply power to all of the components, which are formed from CMOS logic elements.
The microprocessor 10 executes instructions that can be stored outside the chip by setting a value on address bus 18 corresponding to EBI interface 12. The address decoder 16 asserts the corresponding selection signal to EBI interface 12 on line 42. To fetch the instruction, the read/write control signal on control line 24 of the system bus is asserted for read operation mode. The value can be either logical 1 or 0 depending on the system bus protocol. The EBI interface 12 drives the external memory device accordingly to obtain the data required by the microprocessor 10. The instruction to execute is asserted on the EBI data bus 44 by the EBI interface 12, the data multiplexer 26 places the value from EBI data bus 44 on the system read data bus 22. Thereafter, the microprocessor 10 is ready to execute the instruction.
If the instruction is a write instruction to one of the modules connected in parallel on the system bus, the microprocessor 10 performs another similar fetch to obtain the destination address where the data must be written. As soon as all the data are known by the microprocessor 10, it executes a write instruction to the selected peripheral by asserting the system address bus 18 with a value selecting (for example) the crypto-processor module 28. The address decoder 16 deselects the EBI interface 12 by clearing the associated selection signal on line 42 and asserts the selection signal 46 corresponding to the crypto-processor module 30.
Being selected for a write operation, the crypto-processor module 28 writes into its internal registers the value of write data bus 20. The other modules 12, 14, 30, 32, and 34 also receive this value but do not take any action because they are not selected.
The instructions are sequentially executed and perform read or write operations on the system bus for any peripheral connected on the system bus. The microprocessor 10 can also be triggered by a peripheral with the interrupt line 48. This interruption line is driven by the interrupt controller 50, which handles the priorities of the interrupt lines 52, 54, and 56 coming from peripheral modules 28, 30, and 32. For example if the expected result from a peripheral is known to have a large clock cycle latency, it is better to trigger the interrupt line rather than waiting for the result with some kind of no-operation instruction especially when several peripherals have large latency response times (e.g., UART, ETHERNET MAC, CAN, LIN). The software application code would be difficult to generate without interrupt handling in a micro-controller system having several modules with different latencies.
When one of the peripheral modules is accessed and/or is processing data provided by the CPU, the power consumption increases on the power supply lines 38 and 40. This current is due to sequential cell switching, the combinational cell switching and also to the PAD buffer switching (e.g., I/O pads 58 (RXD), 60 (TXD), or 62 (CAN bus)) when the CPU drives a communication peripheral such as UART 30 or CAN controller 32. When several peripherals are used, the total power consumption current is the sum of power consumption current of each peripheral. The average power consumption depends on how many peripherals are used by the user software application and the manner in which they are used.
A peripheral can often be configured to process data in different ways. For example a UART may be configured to transfer a character of different length (6,7,8-bit) including or not the parity bit, or may simply transfer different data on the RS232 line. The resulting waveform of the power consumption current will be different for each of these instances. A crypto-processor module 28 may be configured to perform a crypto algorithm by, for example, using different key length (AES 128, 192, 256), resulting in a slightly different waveform of the power consumption current.
Referring now to FIG. 2, a block diagram shows a generic example of a digital peripheral device. As is well known to persons of ordinary skill in the art, a synchronous digital peripheral is formed from sequential and combinational cells. A digital peripheral may be seen as a series of combinatorial logic networks driven by primary inputs and/or other combinatorial logic networks and/or sequential cell outputs. In the example of FIG. 2, clock pad 62 drives the clock input of D-flip-flop (DFF) 64 via buffer 66. The output of DFF 64 is fed back to its data input through combinatorial logic 68. The output of DFF 64 is also presented to combinatorial logic 68. The output of combinatorial logic network 70 drives pad buffer 72.
The current consumption of digital peripheral device 60 can be divided in three main components. The first component is the current consumption from the clock tree at clock pad 62, clock nets at the input and output of buffer 66, and the clock inputs of sequential cells such as the DFFs/latches 64. Whatever the use of the peripheral, the waveform current consumption from the clock tree is constant as soon as the clock input terminal 52 begins switching.
The second component is the output switching of the DFF 64 producing a current consumption in combinatorial logic networks 68 and 70 whose peak value depends on the data processed by the peripheral. If there is no toggling at the clock input terminal 62 there is no current consumption in combinational logic networks 68 and 70.
The third component is the switching current due to the pad buffer 72. If there is no toggling at the clock input terminal 62 there is no current consumption in pad buffer 72. The peak current in pad buffer 72 is often higher than the peak current in combinatorial logic networks 68 and 70 because the transistors used in this kind of buffer are oversized to drive external lines with big capacitance and may be big loads (low resistive load). When providing a communication protocol (UART, LIN, CAN) to an external line, the current in the pad buffer 72 does not exist for each clock cycle on clock input pad 62, but rather depends on the protocol itself and/or the data value transferred.
A series of waveforms showing an example of current consumption of a digital communications peripheral is shown in FIG. 3. This waveform is not extracted from an actual simulation but rather provides an idea of the current shapes.
The current in combinatorial logic networks 68 and 70 may vary from cycle to cycle depending on the algorithm processed and/or the configuration used for a peripheral. For example an UART may be configured to transfer 8 bits of data with or without a parity bit. The parity bit may be calculated in serial mode (for each bit time, a 1 bit counter is toggling according to the transmitted bit value) or in parallel using more combinatorial cells (XOR) to compute parity when the data to transmit is loaded into a register, resulting in a different power consumption current. When a parity bit is transmitted, certain architectures (parallel) may give a current peak higher than that of the serial case. This may be a source of difference in shapes of the current from clock cycle to clock cycle.
Referring now to FIG. 4, a block diagram shows a multi-stage logic network 80, Logic network 80 is merely illustrative and has DFFs 84, 86, 88, and 90 as inputs. A first stage includes AND gate 90, and inverters 92 and 94. A second stage includes AND gate 96, and OR gates 98 and 100. A third stage includes OR gate 102 and AND gate 104. A fourth stage includes inverters 106 and 108. A fifth stage includes DFFs 110 and 112. FIG. 5 is a series of waveforms that show the current consumption of the circuit of FIG. 4 as a function of time.
From an examination of FIGS. 4 and 5, it may be seen that the maximum duration of the current pulse for combinatorial logic networks is defined by the number of cell stages in these networks. Each logic cell has an intrinsic propagation delay. Therefore the overall power consumption current is the sum of all cell currents, each stage in the network generating a pulse delayed from the previous one by the intrinsic propagation delay. Because each level has different types of cells, the intrinsic delay is different and the overall power consumption current of such combination network looks like a pulse. After the last stage switch, the overall current decreases.
In a synchronous module such as one in which DFFs are sampling the outputs of combinational networks, the active edge of the clock must be located after the switching of the last stage of the combinational network has completed. This must be calculated in the worst-case condition of the circuit (i.e., process, temperature, voltage, etc.). Therefore the maximum propagation delay of combinational networks is the main factor in calculating the maximum frequency at which the circuit may be clocked.
The current consumed by the operation of a peripheral (or any kind of logic) generates voltage drops in internal power supply lines of the integrated circuit. A part of the voltage drop is due to the resistivity of the power supply lines, the more important the current peak is, the more voltage drop down occurs.
Another factor of noise on lines is the current slew rate. The more current is switched in a given period of time (also known as “di/dt”), the more parasitic voltage is created on internal/external power supply lines. These parasitic voltages occur due to the inductive factor of the power supply lines and on any internal net able to toggle from logical 0 to 1 and vice versa. On power supply lines, the current is much higher than on an internal single control/command net driving several inputs of cells. The power supply lines are also capacitive, and, when logic is switching, the induced voltage parasitic induced can propagate on the power supply lines of the integrated circuit and may interfere with the other circuits powered on the same supply of the printed circuit board.
Yet another potential source of interference is the electromagnetic propagation that may occur due to different lengths of internal nets such as power supply lines, in combination with different parasitic capacitors and inductances. For some application, especially the automotive market, the electromagnetic compatibility is a key factor. Therefore, to improve the electromagnetic compatibility, the current slope must be reduced.
To reduce the current slope, two factors may be adjusted: the current peak value or the time required to process the data. The second factor may not easy to adjust because it partially depends on the architecture of the logic of the particular peripherals embedded in the micro-controller. Once manufactured, it is no longer possible to modify it. The logic architecture can be designed so that there is less combinational logic between the DFFs, leading to less power consumption current in the logic. Such a solution, however, requires more DFFs to obtain the equivalent function, resulting in higher power consumption current and a larger number of clock cycles to perform the data processing. Such solutions may degrade some functions of the logic (maximum baud rate of an UART, minimum throughput of a crypto-processor, etc.).
The first factor, peak value of the current, (di) can be optimized. The peak value of power consumption current, as described in FIG. 3 results of the addition of several currents. If a single peripheral is processing data into a micro-controller it is difficult to get improvement in di/dt (current slope) because the architecture of the peripheral logic is pre-determined.
If the peripheral is a communication peripheral it is possible to delay the current pulse due to PAD switching and therefore prevent the peak current of the core logic from occurring at the same time as the peak current of the pad buffer. Even if theses currents are internally carried by different power supply lines (pad ring power supply rails are independents of core power supply lines, separated terminals are defined for both), all power supply package pins of the circuit may be connected together on the printed circuit board, outside the integrated circuit. In such case, the currents may add together and create a larger di/dt with the described consequences in terms of electromagnetic compatibility (EMC) at the printed circuit board level.
One method of improving the EMC characteristics of digital systems is to introduce a fixed delay (formed from, e.g., cascaded buffer cells or inverters cells) between the output of the peripheral logic and the input of the PAD buffer. One drawback of this method is that the delay value may not be optimal for all cases of use. Depending on the frequency of the clock driving the communication peripheral, the delay to obtain the optimal value of electromagnetic compatibility and/or minimum voltage drop in lines may be different for each operating frequency. This is especially true when the micro-controller is able to use a wide range of clock frequencies. For example, micro-controllers for the automotive market may operate in a range of from 8 MHz to 50 MHz.
By unbalancing the terminal clock of each module of the micro-controllers, the current peaks of the modules may add together in a less destructive manner for EMC, resulting in a limited current peak compared to a full balanced clock circuit. This is true for the peak current due to sequential cells, but the shape of the current resulting from the combinational logic is more complex and a fixed unbalancing may result to higher current peaks. Fixed unbalancing of internal clocks of same frequency is used especially in some integrated circuits for the automotive market where the electromagnetic compatibility must be improved but the unbalancing delay is limited by the maximum acceptable clock frequency of the circuit. The lower the clock frequency, the larger the clock period, therefore delay margins are larger at low frequencies and higher unbalancing may be performed.
Therefore there is a need for adjustable delay to guaranty optimal EMC compatibility whatever the clock frequency is. A programmable level of adjustable delay to provide internal clock balancing that may be programmed by any means including terminal inputs of the circuit or user configurable registers would be particularly useful.