1. Technical Field
The present application is generally related to the field of latches for storing logic states and, more specifically, to nonvolatile shadow latches that use two-terminal nanotube switches.
2. Discussion of Related Art
Volatile circuits have been and continue to be the norm in digital circuits. In the initial development phase, bipolar circuits were universally used for analog and digital circuits. Denser and more easily integrated but slower FET-based circuits soon followed, and were introduced for low cost and low power applications such as calculators, for example, while bipolar circuits were used for high speed applications. In order to eliminate static power dissipation present with bipolar, NMOS-only, or PMOS only chips, circuits based on complementary CMOS (combined NMOS and PMOS) devices were introduced and static power dissipation was virtually eliminated because power dissipation occurred only when circuits were switching. FET device scaling was introduced and used successfully to approximately double the number of circuits every two years, while increasing device and circuit performance, all at lower on-chip voltages to contain power dissipation to acceptable levels.
As the number of circuits grew into the millions, bipolar power dissipation became so high that CMOS was used to replace bipolar circuits, and CMOS became the technology of choice for the semiconductor industry for logic, memory, and analog products. Because of a common CMOS technology platform for a wide variety of electronic functions (memory, digital and analog circuits), system-on-chip (SoC) integrating hundreds of millions of circuits and billions of bits became possible. Migration to new denser technology generations enables more function per chip and is done for economic as well as performance reasons. New generations of technology (new technology nodes) result in transistor density improvements with increased current drive of device width and denser interconnect wiring. However, for sub-150 nm technologies, device threshold voltage scaling is increasingly difficult, resulting in high FET device OFF-state leakage currents and correspondingly high static power dissipation. Using conventional dimensional and voltage scaling is no longer sufficient for fast dense chips, SoCs for example, so that power dissipation is setting limits on the combination of speed and function per chip. At the 90 nm technology node, 25 to 50% of the total power (dynamic and static power) is due to leakage current-induced static power dissipation. Projections show that for products at the 65 nm technology node, static power dissipation will exceed dynamic (operating) power dissipation. New generations of technology are limited by power dissipation, especially static power dissipation due to poor scaling and associated high device OFF-state leakage currents. Because many applications such as PCs, cell phones, games, and others are portable and require battery operation, controlling power dissipation while enabling high speed operation is a requirement. Since power dissipation is setting limits on the combination of logic circuit size and operating speed, new chip architecture and circuit design solutions are needed in order to enable continued increases in high performance function.
One approach to power reduction by architecture and design described in U.S. Pat. No. 6,097,243, to Bertin et al., suggests an adjusting mechanism for reducing clock speeds when circuits have been inactive for a predetermined period of time to reduce dynamic power. Static power is also reduced by adjusting source-to-body voltage to increase threshold voltage and reduce associated leakage current. While this approach can reduce power dissipation for some circuits, both dynamic and static power dissipation still remains relatively high. Actually, threshold voltage modulation to reduce power dissipation may only be used in bulk CMOS technologies where body-regions can be modulated. SOI CMOS technology with isolated individual device body-regions cannot be modulated as described in U.S. Pat. No. 6,097,243.
In a related approach to power reduction by architecture and design described in Bertin et al. U.S. Pat. No. 6,097,241, where activity detection circuits monitor input circuit activity at the first logic stage and increase the speed of circuits in subsequent stages in order to enable high speed operation. Modulating device threshold voltage is required as well, with the associated limitations described further above with respect to U.S. Pat. No. 6,097,243.
In still another related approach to power reduction by architecture and design is described in U.S. Pat. No. 6,345,362, to Bertin et al., where plural on-chip functional units at different power levels are matched to instructions requiring various speeds using an on-chip control processor unit and on-chip power management unit to optimize chip power performance. Operating power and associated speed of each functional unit is adjusted by threshold voltage variation with the associated limitations as described further above with respect to U.S. Pat. No. 6,097,243.
A different approach to power reduction by architecture and design is described in U.S. Pat. No. 6,625,740, to Datar et al., where instructions are examined and code is rearranged such that circuits not required for a group of instructions are turned OFF. Circuit groups are turned ON as needed to process various instructions. In the example given, circuits are assumed to require 10 clock cycles to be in the OFF state, and 10 cycles to be restored to the full power state. Both dynamic and static power are reduced in those circuits where power is turned off, however, data is not retained in registers during power OFF and will be lost unless transferred to memory at power-off and transferred back at power-on.
A still different approach to power reduction by architecture and design is described in U.S. Pat. No. 6,658,634, to Goodnow et al., where logic is designed to ensure critical logic nets contain associated registers, and logic synthesis software is used to ensure that the clock can be selectively stopped and last data retained in registers in logic stages that are not required for particular sequences of instructions. While this method reduces dynamic power dissipation, static power dissipation remains high due to leakage currents.
In U.S. Pat. No. 5,986,962, to Bertin et al., power reduction is achieved by architecture and design such that each register (latch) has a corresponding shadow register (latch) designed (optimized) for low power retention (low leakage current CMOS devices). The state of the system is transferred to the shadow latches upon a transition to a low power mode, and power is removed from logic circuits in portions of the chip, or the entire chip. The logic state is restored to each register when power is restored. While this method significantly reduces both dynamic and static power, and in fact eliminates all power dissipation except for the low power shadow registers if the entire chip is turned OFF, the shadow registers introduce significant problems of their own. First, low power dissipation registers (latches) are sensitive to alpha particles and data integrity is an issue. Radiation hardening techniques could be applied to the latches, but some technology changes may be required. Second, static power is still dissipated in the low power shadow latches. Also, adding a low power shadow latch for each high performance latch significantly increases chip area which impacts chip design and reduces the number of chips per wafer, which in turn increases chip cost.
Highly integrated products with a wide variety of circuit functions such as high logic and memory content, system-on-chip (SoC) architecture for example, are an important part of current semiconductor industry design practices. Highly integrated product designs using bulk or SOI CMOS technologies are especially important for portable battery-operated systems that require a high level of integration and the mixed data and signal processing that SoC devices offer. Product requirements, especially in consumer applications, are subject to change as the design progresses. As a result, designs often utilize a combination of disparate elements including embedded, programmable logic functions such as general purpose (usually RISC architecture) embedded microprocessor cores, embedded DSPs, embedded ASIC designs (eASIC), embedded FPGAs, embedded memory, and other functions. Time-to-market with the desired product functions is vital to product success, so that typically there is insufficient time to optimize function for maximum performance at minimum total power dissipation using a more customized approach such as an optimized ASIC design, for example. Instead, designs must include programmable logic functions that dissipate more power than optimized designs in order to allow for flexibility in modifying product function near the end of the design cycle, and servicing multiple applications for economic reasons.
Migration to new denser technology generations enables more function per chip and is done for economic as well as performance reasons. New generations of technology (new technology nodes) result in transistor density improvements with increased current drive of device width and denser interconnect wiring. However, for sub-150 nm technologies, device threshold voltage scaling is increasingly difficult, resulting in high FET device OFF-state leakage currents and correspondingly high static power dissipation.
FIG. 1 shows normalized power dissipation as a function of technology node (and corresponding year). The source of FIG. 1 is the IEEE Computer Society, December 2003. Technology nodes are presented in terms of minimum feature size and associated gate length. Static power grows exponentially as dimensions shrink, while dynamic (switching) power grows at a modest rate. At the 90 nm technology node, 25 to 50% of the total power (dynamic and static power) is due to leakage current-induced static power dissipation. Projections show that for products at the 65 nm technology node, static power dissipation may exceed dynamic (operating) power dissipation. New generations of technology are limited by power dissipation, especially static power dissipation due to poor scaling and resulting high device OFF-state leakage currents. Using conventional dimensional and voltage scaling is no longer sufficient for fast dense chips, SoCs for example, so that power dissipation is setting limits on the combination of speed and function per chip. Because many applications such as PCs, cell phones, games, and others are portable and require battery operation, controlling power dissipation through chip architecture and circuit design is a requirement. However, even in non-portable applications such as workstations and servers, power dissipation limitations caused by poor CMOS technology scaling is limiting operating speeds and requiring power management architectures.
In order to successfully incorporate power management in highly integrated product designs, it is important to understand circuit design efficiency with respect to power dissipation. FIG. 2 shows the energy (pico-Joules) per operation required to implement a 32-bit operation for various logic design approaches. Programmable logic, the most flexible and versatile, is the least power efficient, requiring 2,000 pJ for a PC/Workstation, and 200 pJ for a RISC architecture microprocessor. By contrast, ASIC, the least flexible design approach is the most power effective dissipating only 2 pJ for the same logic function. DSPs are also quite efficient at 60 pJ because they are typically used as an accelerating digital signal processing function to perform specific digital signal processing tasks. The source of FIG. 2 is from a presentation by Bill Dally entitled, “Low-Power Architecture.”
The energy required for various operations is dominated by bandwidth. FIG. 3 illustrates the energy required for register, ALU, and OCD 32-bit operations as well as reading from memory and transferring 32 bits across a chip (100 pJ). The relatively high energy (100 pJ) associated with driving long distance (10 mm) on chip interconnections is a consequence of wiring non-scalability and increasing chip size. The source of FIG. 3 is Bill Dally, International Symposium on High-Performance Computer Architecture, 2002.
If present single processor chip architectures and design methodologies were left unchanged, then power dissipation and latency associated with on-chip interconnection of logic and memory functions would become a dominant factor resulting in power-limited chip performance. Actually, chip architecture has responded and multiple, simple processors, distributed register files, explicit managed local memory, enhanced floor planning for more optimum placement, and other innovations have prevented on-chip interconnections to become a dominant power/performance limiting factor.
With these new evolving chip architectures and design methodologies, limitations to chip performance are primarily due to embedded logic and memory functions as has always been the case. However, these embedded circuits are increasingly difficult to scale as described further above, and static power dissipation is beginning to set performance on chip operation.
Static power in CMOS circuits is present even when no switching takes place. It is due to leakage current that flows because of poorly scaled device threshold voltages and operating voltages. Static power is reduced only by reducing voltages, preferably reducing the voltages in temporarily unused circuits to zero (selectively removing applied voltages from these circuits).
High speed chip design often uses logic design techniques referred to as concurrent operation. These techniques are pipelining and parallelism, in which logic function is divided into smaller pieces (sub blocks), called stages, such that the rate at which instructions are executed improves because many operations are executed at the same time. Concurrent logic design techniques are described in more detail in the following references: H. B. Bakoglu, “Circuits, Interconnections, and Packaging for VLSI”, Addison-Wesley Publishing Company, Inc, 1990, pp. 412-416; and David T. Wang, “Revisiting the FO4 Metric.”
An important aspect of concurrent logic operations is that the start of an instruction does not wait for previous ones to be completed. In this way, all portions of the hardware are utilized every cycle, making best use of available logic and increasing machine throughput. Dependencies between instructions prevent logic performance from achieving maximum possible performance; however instruction optimization is used to achieve faster performance using, for example, the pipelining technique.
The pipelining technique, for example, uses random logic blocks divided (separated) by registers (also referred to as register files, register banks, pipeline latches, or latches) that result in a substantially higher speed of operation; that is, pipelining used to improve the execution rate. Logic is divided into roughly equal smaller pieces, called stages, and a bank of registers (latches) is inserted to hold temporary values (logic states) at the interface of the logic stages. The logic clock frequency may then be increased to a level that is proportional to the inverse of the sum of the longest delay of the logic stages plus the latch delay overhead. Examples of logic stages, registers (single-latch and double-latch designs), and clocking are given in the H. B. Bakoglu reference book described further above, pp. 338-349. Examples of register (latch) design are given in the H. B. Bakoglu reference book, pp. 349-355. Designs are increasing the number of registers and decreasing the logic stage delay. By way of example, the number of registers (latches) used in the IBM 750 power PC chip is about 10,000 registers. The next generation power PC design, the IBM 970, uses about 300,000 registers.
Design Using Volatile Registers (Latches)
Power dissipation is an important consideration because it often sets the maximum performance limit of the logic function as discussed further above with respect to FIGS. 1-3. Presently, logic states are temporarily stored in volatile register latches. However, introducing nonvolatile registers having a dedicated nanotube device per register enables logic states to be saved with no applied voltage, that is zero power dissipation in portions (or all) of the integrated circuit in order to reduce power dissipation, enabling other logic blocks to consume more power and run faster as needed, and other advantages discussed further below.
In addition to the performance benefits of dividing random logic into smaller blocks, there is a testing benefit as well. Logic testing requires that each logic node be switched to both “ONE” and “ZERO” logic states. Chips with a large number of gates, tens and hundreds of millions, for example, cannot be tested efficiently unless the logic is subdivided into smaller stages (blocks). Smaller logic stages separated by latches enable logic testability to reach 98 to 99%, for example. The registers described herein may also be interconnected serially for test purposes. Logic test patterns (test vectors) are applied, and logic response is measured in order to identify and eliminate defective chips as is well known in the industry. The following references discuss design for logic testability: H Fujiwara, “Logic Design and Design for Testability”, Cambridge, Mass., the MIT press, 1985, pp. 238, 256-259; and P. H. Bardel, W. H. McAnney, and J. Savir, “Built-In Test for VLSI: Pseudorandom Techniques”, New York, N.Y., John Wiley & Sons, 1987, pp. 38-43.
A number of different register file circuit designs are possible (see Bakoglu above). For example, a clocked synchronous register file stage circuit design may use a master latch stage circuit and a slave latch stage circuit with two non-overlapping clocks such as CLK1 and CLK2 illustrated in FIG. 4A. Alternatively, a clocked synchronous register file stage circuit design may use a master latch stage circuit and a slave latch stage circuit with a single clock such as CLK (and its complement CLKb) as illustrated in FIG. 4B and described further below.
FIG. 4A illustrates prior art pipelined synchronous logic function 5 using two nonoverlapping clocks CLK1 and CLK2, including logic stages 10 and 14 (and others not shown) separated by registers 7, 12, 18 (and other registers not shown) designed for state-of-the-art high speed operation. Exemplary register 12 is composed of a master (L1) latch 20 and a slave (L2) latch 25. Master (L1) latch 20 is composed of register cells 1-n and slave (L2) latch 25 is composed of cells 1′-n′. A register stage is composed a corresponding pair of register cells, such as register stage 16 composed of corresponding register cells k and k′. It is important to note that logic stages 10 and 14 may be composed of random logic stages, for example, or may be an onboard cache such as a high speed Sync SRAM L1 cache, for example. A master (L1) latch such as master (L1) latch 20 accepts data from preceding logic stage 10 when activated by clock CLK1, captures and holds the input data. A slave (L2) latch such as slave (L2) latch 25 accepts information from a corresponding master (L1) latch 20 when activated by clock CLK2, transmits the information to the next logic stage 14, and then latches the information near the end of the CLK2 clock cycle.
FIG. 4B illustrates prior art pipelined synchronous logic function 40 using a single clock CLK, including logic stages 50 and 60 (and others not shown) separated by registers 45, 55, 65 (and other registers not shown) designed for state-of-the-art high speed operation. Exemplary register 55 is composed of a master (L1) latch 70 and a slave (L2) latch 75. Master (L1) latch 70 is composed of register cells 1-n and slave (L2) latch 75 is composed of cells 1′-n′. A register stage is composed a corresponding pair of register cells, such as register stage 80 composed of corresponding register cells k and k′. It is important to note that logic stages 50 and 60 may be composed of random logic stages, for example, or may be an onboard cache such as a high speed Sync SRAM L1 cache, for example. A master (L1) latch such as master (L1) latch 70 accepts data from preceding logic stage 50 during the first half of the clock CLK cycle time, captures and holds the input data, and also transfers the data to the slave (L2) latch at the beginning of the second half of the clock cycle. A slave (L2) latch such as slave (L2) latch 75 accepts information from a corresponding master (L1) latch 70 at the beginning of the second half of the clock CLK cycle time, transmits the data to the next logic stage 60, and then latches the data near the end of the second half of the clock CLK cycle time.
The electrical characteristics of state of the art PC chips, e.g. the IBM 970 power PC chip used in Apple computers and SONY Playstations, illustrate the relationship between operating speed and dynamic and static power dissipation in high speed synchronized logic chips using two non overlapping clocks design. The IBM 970 chip operates at 1.3 volts, is designed at the 130 nm technology node using an SOI CMOS technology with copper wiring, and includes an on-board L1 Sync SRAM cache of 1 megabit, an on-board L2 Sync SRAM cache of 4 megabits, and a double-latch design with non-overlapping clocks CLK1 and CLK2 (similar in approach to synchronous logic function 5 of FIG. 4A) operating at approximately 3 GHz clock frequency.
In operation, at a clock periodicity of approximately 340 ps, a master latch has approximately 170 ps to accept data from a preceding logic stage, capture (latch) the data, and have the data ready for the slave latch. A slave latch has approximately 170 ps to accept data from a corresponding master latch, transmit the information to the next logic stage, and then latch the information.
The IBM 970 chip has a dynamic (active) power dissipation of approximately 90 watts and static (standby) power dissipation due to device leakage of 25 watts; static power is approximately 28% of the active power dissipation. FIG. 5 illustrates prior art IBM 970 power PC relative dynamic (active) and static (standby) power plotted at the 130 nm technology node point on prior art FIG. 1 which illustrates projected relative dynamic and static power based on CMOS device scaling that includes the increasing impact of device leakage current on static power due to less-than-ideal threshold voltage and corresponding power supply scaling. The state-of-the-art IBM 970 power PC chip relative power dissipation values indicate that the static power problem is at least as significant as indicated in FIGS. 1 and 5, and that as more advanced technology nodes are developed, the static power dissipation may become dominant unless architecture and circuit design means are used to prevent it.
FIG. 6 illustrates prior art register file stage circuit 500 which corresponds to register stage 80 illustrated in FIG. 4B. A description of register file design and operation may found in the reference H. B. Bakoglu, “Circuits, Interconnections, and Packaging for VLSI”, Addison-Wesley Publishing Company, Inc, 1990, pp. 349-356. Prior art register file stage circuit 500 includes a master latch stage circuit 505 and a slave latch stage circuit 510, all operating in synchronous (clocked) mode and all are volatile. That is, stored data is lost if power is lost or removed. Master latch stage circuit 505 has input node 515 and output node 520. Slave latch stage circuit 510 has input node 520, which is also the output node of master latch stage circuit 505, and output node 525. Node 520 is also a storage node of slave latch stage circuit 510.
Input node 515 of master latch stage circuit 505 receives input signal VIN and drives CMOS transfer gate 530, which is connected to node 535, and drives a first storage node 535 formed by cross coupled CMOS inverters 545 and 550. Input signal VIN corresponds to VIN from logic 50 in FIG. 4B. CMOS transfer gate 530 uses both NMOS and PMOS devices instead of an NMOS-only transfer gate, for example, to ensure that both logic “1” and logic “0” states transition between full power supply and ground voltage levels by eliminating device threshold voltage drops. Clock CLK 540, and complimentary clock CLKb 540′ are used to enable or block input signal VIN on input node 515 from driving node 535 by turning CMOS transfer gate 530 ON and OFF, thereby determining the logic storage state of cross coupled CMOS inverters 545 and 550. Note that all inverters are CMOS inverters unless otherwise specified. CMOS inverters include a PMOS pull-up device connected to a power supply, and a NMOS pull-down device connected to ground and operates as discussed in the reference by H. B. Bakoglu, “Circuits, Interconnections, and Packaging for VLSI”, Addison-Wesley Publishing Company, Inc, 1990, pp. 152. Cross coupled inverters 545 and 550 drive a storage node 555 which is connected to CMOS transfer gate 560. Clock CLK and complimentary clock CLKb are used to enable or block stored logic state node 555 from driving master latch stage circuit 505 output node 520 by turning CMOS transfer gate 560 ON and OFF.
Input node 520 of slave latch stage circuit 510, which is also the output node of master latch stage circuit 505, drives inverter 570. The output of inverter 570 output VOUT on output node 525, and also drives the input of inverter 575. Output signal VOUT corresponds to VOUT in FIG. 4B, which drives an input to logic 60. The output 580 of inverter 575 is connected to CMOS transfer gate 585. Clock CLK, and complimentary clock CLKb are used to enable or block the presence of a feedback loop that cross couples inverters 570 and 575 when enabled. When storing data, CMOS transfer gate 585 is ON and inverters 570 and 575 form a cross coupled storage device with node 520 acting as a storage node. When CMOS transfer gate 585 is OFF, then inverters 570 and 575 are not cross coupled and do not form a storage device.
In operation, a clocking scheme such as illustrated in FIG. 4B is used to synchronize the operation of double-latch design 40 illustrated in FIG. 4B. Register stage 80 includes cell k, a subset of master (L1) latch 70 and cell k′, a subset of slave (L2) latch 75.
A master (L1) latch such as master (L1) latch 70 accepts data from a preceding logic stage 50 during the first half of the clock cycle time, captures and holds the data, and also transfers the information to a slave (L2) latch such as slave (L2) latch 75 at the beginning of the second half of the clock cycle time. A slave (L2) latch such as slave (L2) latch 75 accepts information from a corresponding master (L1) latch 70 at the beginning of the second half of the clock cycle time, transmits the information to the next logic stage 60, and latches the information before the end of the second half of the clock cycle time. If the clock is stopped during the first half of the clock cycle, then master (L1) latch 70 holds (stores) a logic state or data. If the clock is stopped during the second half of the clock cycle, then slave (L2) latch holds (or stores) a logic state or data. If power is removed or lost, the logic state or data are lost.
FIG. 6 illustrates prior art master latch stage circuit 505 corresponding to cell k of register file stage 80 of master (L1) latch 70 illustrated in FIG. 4B, and slave latch stage circuit 510 corresponding to cell k′ of register file stage 80 of slave (L2) latch 75 illustrated in FIG. 4B.
In operation, at the beginning of a clock cycle, clock CLK 540 transitions from high to low voltage and remains at low voltage for the first half the clock cycle, and complimentary clock CLKb 540′ transitions from low to high voltage and remains at high voltage for the first half of the clock cycle. CMOS transfer device 530 turns ON coupling input node 515 voltage VIN to storage node 535. CMOS transfer device 560 turns OFF and isolates the output of master latch stage circuit 505 from the input node 520 of slave latch stage circuit 510. CMOS transfer device 585 also turns OFF breaking the feedback path between the output 580 of inverter 575 and the input 520 of inverter 570 such that node 520 does not act as a storage node. Voltage VIN may transition to a voltage value corresponding to the correct logic state any time prior to the end of the first half of the clock cycle, providing sufficient time remains for cross coupled inverters 545 and 550 to store the corresponding logic state prior to clock transition at the beginning of the second half of the clock cycle.
Clock CLK 540 transitions from low to high voltage and remains at high voltage at the beginning of the second half of the clock cycle, and complimentary clock CLKb 540′ transitions from high to low voltage and remains at low voltage for the second half of the clock cycle. CMOS transfer device 530 turns OFF decoupling input node 515 voltage VIN from storage node 535, which remains in a state corresponding to input voltage VIN at the end of the first half of the clock cycle. CMOS transfer device 560 turns ON and transfers the state of storage node 555 to input 520 of inverter 570 that drives output node 525 to output voltage VOUT, and also drives the input of inverter 575. CMOS transfer device 585 turns ON which enables output 180 of inverter 575 to drive the input of inverter 570 and store the state of slave latch state stage circuit 510 until the end of the second stage of the clock cycle.
In U.S. Pat. No. 5,986,962, to Bertin et al., volatile low power shadow latches hold register file logic states or data so that volatile high performance register file power may be turned OFF to reduce static power dissipation as described above. However, volatile low power shadow latches must remain ON and therefore still dissipate power while storing logic states or data in backup mode because the storage is volatile, and information is lost if power is lost. Furthermore, volatile low power dissipating shadow latches use low bias current to minimize static power and are therefore very susceptible to disturb, in which stored logic states or data may be lost or corrupted. This may occur due to power supply noise, on-chip switching noise, alpha particle or other radiation disturb, for example. Also, shadow latches require additional chip area that can substantially increase chip size.
FIG. 7 illustrates prior art subsystem 700 with two modes of operation, a normal run mode and a low power logic state (or data) retention mode. In the normal run mode, volatile high performance and corresponding high active power logic operations are executed using high performance system latches. In the low power logic state (or data) retention mode, logic state or data is stored in low power shadow latches. Volatile means that logic state or data information is lost is power is lost or removed.
FIG. 7 illustrates a plurality of prior art volatile system latches 710, 710′, and 710″ coupled to related volatile shadow latch circuits 720, 270′, and 720″ by dedicated coupling circuits 730, 730′, and 730″. System latches may also be referred to as a latch circuit or as a register file or register file circuit, for example. The system or latch circuits are powered from VDD supplied by switch S1, which comes from power source P. The shadow latch circuits are powered from supply VMS supplied by switch S2, which comes also comes from power source P. However, switches S1 and S2 may get power from different sources instead. A detector D is used to detect a request for low power, which may come from a low power interrupt pin (not shown), or by monitoring an op code stream ST for a code calling for low power as shown in FIG. 7. When detector D detects an op code (or interrupt pin) calling for low power or standby mode, detector D energizes its output resulting in two effects. One effect is to enable switch S1 to provide power from voltage supply VMS. A second effect is to activate switch S2, after a time delay between detector D transition and switch S2 activation, to disable the VDD power supply to the latch circuits. A time delay is introduced to ensure that shadow latches 720, 720′, and 720″ are enabled by the time latch circuits are de-powered. Volatile shadow latches 720, 720′, and 720″ remain powered at voltage VMS until the reduced power mode has ended, and may be de-powered only after the stored logic state or data is transfer to volatile system latches 710, 710′, and 710″.