1. Field of the Invention
One embodiment of the invention relates to a multi-context memory cell having a first memory means and a plurality of second memory means, it being possible for the digital data information stored in the first memory means to be saved in each of the second memory means. The multi-context memory cell may be used for storing an item of binary data information. Moreover, embodiments of the invention relate to a multibit multi-context memory cell comprising a plurality of binary multi-context memory cells mentioned above. Furthermore, embodiments of the invention are directed at a memory block comprising a plurality of multibit multi-context memory cells of this type.
2. Description of the Related Art
A context describes the way in which a specific hardware circuit is utilized. In modern processor architectures, the efficient processing of different parallel contexts is assuming an ever weightier significance. Two different types of contexts can be differentiated:
a) Configuration Context
The so-called configuration context specifies how a flexible hardware circuit is instantaneously configured. Examples of a configuration context are the programming of a processor or the bit sequence for the configuration of an FPGA (field programmable gate array).
b) Execution Context
In some cases, the execution context, by contrast, may be understood as the instantaneous state of specific memory cells or of each memory cell of a specific hardware circuit. Examples of an execution context may include the values of counters, data registers or status registers.
In some data processing concepts in which the same hardware circuit is used differently at different times, it may be useful for an active context to be replaced more or less frequently by a different context, the deactivated context being stored for reactivation. This operation is also referred to as a context switch. Examples of data processing concepts based on parallel contexts are multitasking (concurrence of a plurality of processes), multithreading (concurrence of a plurality of processing threads within a process), hardware reprogramming and hardware reconfiguration. In some cases, to provide a maximum efficiency, the outlay for a context switch may be as small as possible. In this connection, outlay may be understood to be the additional chip area used for storage and for switching of contexts or the time delay associated with the context switch.
Thus, in so-called multi-context processors (multiple context processors), a single processor may be used for a plurality of quasi-parallel processing threads, each processing thread being assigned to a context. One advantage of such processors may be that a waiting time which occurs in the case of a specific context (for example for reading out data from an external memory) is bridged with data processing steps of a different context that is activated in the meantime.
Two processor data paths are illustrated by way of example in FIG. 1. The left-hand data path involves an arithmetic unit with two data inputs in0 and in1 and a data output out. This data path comprises a plurality of D-type flip-flops FF and multiplexers MUX and also an arithmetic logic unit ALU/MULT for carrying out multiplications or additions. The right-hand data path is extended in comparison with the left-hand data path in such a way that it supports a plurality of contexts. For this purpose, three flip-flops FF are provided per context, said flip-flops being represented in the same plane in FIG. 1. The respectively active context is selected by means of the context switch signal sel, which in each case selects three flip-flops FF of a plane. The multiplexers typically used for selection in the data path at the inputs and outputs of the flip-flops FF are not illustrated in FIG. 1.
For processing a plurality of contexts, three different approaches are known in principle in the prior art:
1. Storing a Deactivated Context in a Buffer Memory and Reading It in Again from Said Buffer Memory
One concept of the first approach is for a deactivated context to be stored in a buffer memory and to be retrieved from said buffer memory when being used. Such an approach is often used for processing execution contexts in computers. In one embodiment, when a task switch takes place in such a system, by way of example, the data content either of, for example, all the context registers or only of those context registers which are of importance for the subsequent task may be saved in the main memory of the system. A plurality of clock cycles may be used for saving the context information from the context registers and for storing the context information into the context registers since these steps are typically carried out sequentially for the affected context registers.
In a similar manner, in cache memories, a cache line provided for exchange is saved in the main memory. In this case, too, a plurality of clock cycles may be used for saving the cache line and possibly for restoring a saved cache line.
A further example for this approach relates to the configuration memory of an FPGA. Such a configuration memory is loaded sequentially with a bit sequence from an additional external memory, the external memory typically being a non-volatile memory, for example PROM (programmable read-only memory), ROM (read-only memory) or flash memory. In this case, different configurations can be stored as configuration contexts in the external memory and be read into the FPGA when being used. In a modern FPGA, it is also possible only selectively for part of the configuration memory to be occupied by new configuration information. In contrast to execution contexts, the configuration context currently stored in the configuration memory does not have to be saved before a different configuration context is read in from the external memory.
2. Switch Between Individual Context Memory Modules
In this approach, each context is allocated a dedicated context memory module, for example in the form of a set of register cells or SRAM (static random access memory). In this case, the respective memory module is selected by means of a suitable selection means, typically by means of a multiplexer which receives the outputs of the individual memory modules. A typical field of application for this solution is multithreading, each processing thread being allocated to a context.
This approach is distinguished by a high processing speed, that is to say that ideally a context switch can be carried out without any delay or with a delay merely of one clock cycle. In some cases, it may be disadvantageous, by contrast, for the memory modules to be embodied identically and thus provide the same circuitry outlay. The greater the complexity with which even only a single memory module may be embodied, for example because a plurality of inputs and/or outputs may be used, the greater, too, may be the implementation outlay for the rest of the memory modules. Moreover, the selection means is situated in the access path, as a result of which the timing for the memory access may be adversely influenced.
If the number of contexts to be supported is greater than the number of available memory modules, the approach described under point 1. may additionally be employed.
3. Selection of Different Segments in a Single Context Memory Module
This third concept known from the prior art uses only a single large context memory module in which a plurality of contexts are stored. Specific bits of the address word applied to the memory module in this case serve as a pointer which points to the respective memory segment allocated to a context. The respective segment address is generated by means of a modulo n up and down counter in this case, thus resulting in a ring memory structure. What may be disadvantageous about this solution is that the entire memory module, as described in a similar manner under point 2. may have the same complexity (for example with regard to the number of inputs and outputs). Moreover, in comparison with a plurality of smaller memory modules, a large memory module may be more sluggish in terms of access (for example, on account of the more complex addressing). If the number of contexts to be supported is greater than the storage capacity of the memory module, the approach described under point 1. may additionally be pursued.
Further, highly specialized solutions for the processing of contexts are known in addition to the three basic approaches described above. Thus, the document “A time-multiplexed FPGA”, Trimberger, S. et al., 5th IEEE Symposium on FPGA-Based Custom Computing Machines (FCCM '97), page 22 et seq., and the document U.S. Pat. No. 6,480,954 B2 disclose providing a plurality of configuration contexts in an FPGA. Mutually corresponding bits of different configuration contexts are read into a corresponding number of SRAM cells in this case, an SRAM cell being allocated to each configuration context per configuration bit. By means of select lines, a specific SRAM cell is selected for each configuration bit depending on the configuration context chosen and is read out via a common bit line into a flip-flop. The current configuration bit can be retrieved via the flip-flop.
As the number of power-loss-sensitive applications of monolithic integrated circuits increases, for example, in the area of mobile applications, and as the power loss consumption rises on account of increasing complexity of these circuits, it is expedient to operate temporarily unused circuit blocks with a reduced power loss in a so-called power-down mode, or even to shut them down completely.
For some applications it is advantageous if the memory cells used for data storage, for example, in sequential circuits, save their storage state upon transition from the normal operating mode to the power-down mode, so that the respective circuit block may have the same state as previously again after the transition from the power-down mode to the normal operating mode and the restoring of the saved storage states.
What may be disadvantageous about the three basic approaches for context processing as described above is that none of these approaches already inherently enables the storage state to be saved and retained during operation in the power-down mode. In the case of the first approach (see point 1), this function could be provided by using a non-volatile memory, for example flash memory, instead of a volatile DRAM (dynamic random access memory) for the buffer memory, but this equates to an additional outlay.
The typical cases of application and also the possible advantages and disadvantages of the three basic approaches described above are summarized again in the table below:
TABLE 1Comparison of known context switch conceptsApproachTypical applicationAdvantagesDisadvantagesUse of aMultitaskingLowA plurality ofbufferCacheimplementationclock cyclesmemoryoutlaymay be used(see point 1)Possibility offor saving andrealizing the savingrestoringof the storage statein the power-downmode with a non-volatile buffermemorySwitchMultithreadingFast context switchOutlay risesbetweenlinearly withindividualthe number ofmemorycontextsmodulesSelection(see point 2)means in theaccess pathSaving of thestorage statein the power-down modemay not besupportedUse of aMultitaskingFast context switchOutlay risessingleRing memorylinearly withmemorythe number ofmodulecontexts(see point 3)Slow memoryaccessSaving of thestorage statein the power-down modemay not besupported
For the purpose of saving the storage state of a circuit block in the power-down mode, it is known from the prior art to use so-called retention memory cells which are in each case provided not only with volatile memory means but also with an additional memory means for saving and retention of the storage state during operation in the power-down mode. If the memory cells are latches or flip-flips with bistable multivibrators for volatile data storage, the term used is retention latches or retention flip-flops.
In the implementation of such retention flip-flops or retention latches (only one multivibrator), two types of retention flip-flops or retention latches are known in principle in the prior art: firstly memory cells which are implemented purely using CMOS technology (complementary metal oxide semiconductor) and secondly memory cells which are based on a combination of CMOS technology and non-volatile memory technology.
If the retention flip-flops or retention latches are realized purely using CMOS technology, a second supply voltage is typically provided in addition to the primary supply voltage, and, in contrast to the primary supply voltage, is not switched off during the power-down mode. A retention flip-flop of this type is disclosed in the documents U.S. Pat. No. 5,473,571 and “A 1-V High-speed MTCMOS Circuit Scheme for Power-down Application Circuits”, Shigematsu et al., Journal of Solid State Circuits, June 1997, pages 861 to 869. In this case, an additional latch fed by means of the second supply voltage is used to save the storage state of the flip-flop during the power-down mode. If the flip-flop is fed with the primary supply voltage again after the power-down mode, the saved storage state can be restored again into the circuit section of the flip-flop that is operated with the primary supply voltage. A retention flip-flop of this type is often referred to as a balloon flip-flop in the prior art, the latch operated with the second supply voltage being referred to as a balloon latch.
The prior art (thus, by way of example US 2003/0188241 and US 2004/051574) discloses a multiplicity of retention flip-flops which are based on the balloon flip-flop described in the document cited above and have a reduced area usage or a higher data rate.
Retention flip-flops based purely on CMOS technology may, in some cases, have a multiplicity of disadvantages. For example, one property of pure CMOS technology may be the storage volatility thereof. In some cases, it may only be used for volatile data storage. If the supply voltage is switched off, the data content may be lost. For this reason, a second supply voltage may be provided, which remains active in the power-down mode. This may cost additional chip area since the second supply voltage may be distributed on the chip. Moreover, a balloon flip-flop of this type may occupy a larger chip area in comparison with a customary flip-flop without data saving. In some cases, the additional area usage of a flip-flop of this type may correspond to more than 50% of the chip area of a customary flip-flop. Moreover, a leakage current may flow during the power-down mode in the balloon latch, said leakage current being associated with an additional power loss during the power-down mode. In some cases, the leakage current may be reduced by using transistors having a threshold voltage with a large magnitude.
In some cases, the disadvantages mentioned above may be eliminated for the most part by using non-volatile memory technologies in the implementation of a retention flip-flop.
So-called PCM technology (phase change memory) may be suitable for this purpose, this technology being briefly described below. PCM technology is currently the focus of intensive research, for example, in connection with matrix memories. PCM technology makes it possible to program the value of a resistance element, the programming being non-volatile and thus being maintained when the supply voltage is switched off. PCM technology is based on changing the phase state of a chalcogenide glass thermally in a reversible manner between the amorphous state and the polycrystalline state. In this case, the resistivity of a resistance element comprising chalcogenide glass is greater in the amorphous state than in the polycrystalline state. The change in the phase state is brought about by heat generated by means of a current pulse through the resistance element. In this case, the duration and the current intensity of the current pulse determine whether the resistance element subsequently may have a high or low resistance value.
FIG. 2 illustrates two exemplary current pulses 1 and 2 for the programming of a resistance element of this type. The resistance element is converted into the amorphous state with a resistance value of 1 MΩ by means of the current pulse 1 having a relatively high current intensity of 200 μA and a relatively short pulse duration of 20 ns, while the resistance element is transformed into the polycrystalline state with a resistance value of 10 kΩ by means of the current pulse 2 having a relatively low current intensity of 50 μA and a relatively long pulse duration of 50 ns.
One advantage of PCM technology over other non-volatile memory technologies may be that a reduction in the dimensions of the storing elements is advantageous. The smaller the structures used, the lower the used current intensity of the current pulse that initiates the phase change. What is more, PCM resistance elements may be realized in the upper layers of a CMOS semiconductor process so that the resistance elements can be arranged above the transistors, for example in each case directly above the transistors assigned to a memory cell.
The document US 2004/0141363 Al describes the application of PCM technology in connection with SRAM memory cells. Customary SRAM memory cells in each case comprise a bistable multivibrator in the form of two cross-coupled inverters for volatile information storage, the binary information corresponding to the potential value of the two output nodes of the two inverters. In the case of the SRAM memory cell described in this document, a respective PCM resistance element is additionally connected to each of the two storage nodes via a switchable coupling NMOS transistor. In order to save the binary information stored in the bistable multivibrator, the resistance elements are programmed in a manner dependent on the potential of the storage nodes. The binary information can subsequently be restored from the resistance elements into the bistable multivibrator again.