(1) Field of the Invention
The present invention relates to the field of static memories. More specifically, the present invention relates to an apparatus and method for discharging bit lines in a memory device.
(2) Description of the Related Art
Many microprocessors include set-associative level-1 (L1) cache devices to enhance performance. Often, L1 caches implement a "cache-banking" and "stop-clock" scheme to reduce power dissipation. The "stop-clock" mechanism involves halting the toggling of the clock signal, that is routed to various units of the microprocessor including the cache, if those units are not currently used. A set of elaborate conditions are used to detect when the clocks routed to a particular unit of the processor may be safely stopped without causing a malfunction--the details of those conditions are beyond the scope of this discussion.
To facilitate the understanding of the "cache-banking" scheme, which will be explained later in this section, the following discussion uses as a non-limiting example a cache device with the following characteristics: the address bus is 32 bits wide; the cache has 20 tag address bits (address bits 31:12); the cache has 8 set address bits (address bits 11:4) thus cache will have 256 sets; the cache has a line-size of 16 bytes (128 bits), the 4 lowest address bits [3:0] being used to specifically access any of the 16 bytes within a cache line; the cache can have any number of ways (depending on the cache size).
FIG. 1a shows a column of an SRAM cache device. In this cache device, 256 SRAM cells 2 are placed on a same Bitline 4 and Bitline# 6, each SRAM cell having its own Wordline 8. The Bitline 4 and Bitline # 6 are gated as inputs to a sense amplifier 10 which detects differences in a level of voltage between Bitline and Bitline #. A pre-charge circuit 12 pre-charges the Bitline and Bitline # to the same level (VCC-V.sub.tn), during a first half of every clock cycle. A write-control circuit 14 is used to drive the appropriate data values from a Drive Input line 15 along bitline and bitline # during a write cycle.
FIG. 1b shows a gate level representation of a typical SRAM cell. A pair of cross-coupled inverters will store a logic value and respectively, a logic complement of the logic value. The two NMOS transistors 17 and 19 serve as pass gates when a pulse is sent via the wordline 8. Data stored in the SRAM cell is thus read by driving bitline and bitline # with data stored in the SRAM cell.
FIG. 2 shows the structure of a complete data array for an "unbanked" cache using Single Data Bit Array elements "SDBA" elements. An SDBA represents one column of the cache device. Since each cache line has 16 bytes (128 bits), 128 SDBA elements are shown in this figure. A set address decoder 218 decodes the Set address bits [11:4] outputting decoded wordlines WLDEC [255:0]. Only one of the 256 lines is active at any time. A Pulse Timer circuit 220 receives a CACHE.sub.-- CYCLE signal from a processor (not shown). CACHE.sub.-- CYCLE indicates that the cache is currently accessed by the processor. If the CACHE.sub.-- CYCLE signal is active, the Pulse Timer generates, every clock cycle, a pulse of a certain length, after a certain delay, on each of the lines "wordline-pulse" and "sense-amp-pulse." The pulse on the "wordline.sub.-- pulse" line causes an appropriate wordline decoded by the Set Address Decoder 218 to be pulsed.
An AND gate 222 is provided for each wordline of the cache. Each one of the 256 outputs of the Set Address Decoder is coupled to one input of a different gate 222. However, only one of these AND gates 222 will receive a WL.sub.DEC signal set to `1`. The "wordline pulse" line 223 coupled to Pulse Timer 220 is also input to each of the gates 222. Gate 222 generates at its output, which is connected to a wordline 224 of the cache array, a wordline signal WL which replicates the wordline pulse issued by the Pulse Timer if the corresponding wordline decode signal WL.sub.DEC is set to logic 1. In other words, if the address of a specific line has been decoded by the address decoder 218 and a CACHE CYCLE signal is issued by the processor, a selected line of the cache array will be pulsed. In so doing, an operation issued by the processor, such as a "read" or a "write", can be performed upon the cache array when additional signals required for these operations are provided by the pulse timer circuit 220 of FIG. 2. Upon reading or writing from or to a particular SRAM cell, the respective bitline and bitline # coupled to the SRAM accessed by the microprocessor will be discharged. Typically, the bitlines of a static memory, such as the memory shown in FIGS. 1 and 2, act as capacitors and are precharged to a predetermined voltage prior to reading the SRAM cells. The state assumed by the SRAM memory cell is read by applying an input voltage to the selected wordline and sensing which bitline experiences a change in voltage. The state assumed by a selected SRAM memory cell determines which bitline, Bitline or Bitline #, will be discharged towards ground or VSS when the cell is read. A write operation is performed in a similar way by discharging a specific bit line.
Accesses to the cache device are sequenced by a clock generating clock cycles. Each clock cycle can be conceptually divided into two equal phases. The first phase, first half of the clock-cycle, is the precharge phase, wherein the precharge circuitry 12 of FIG. 1 precharges the bitline and bitline# to the same high voltage level (VCC-V.sub.tn) where V.sub.tn is the threshold voltage for the NMOS device used to precharge the two bitlines. The precharger is turned off in the second phase, when one of three following events will occur. One possibility is that no bit is accessed during that clock cycle, the bitline and bitline# remaining precharged at their high levels. The bitlines will not be discharged and, thus, no power will be dissipated. None of the wordlines nor the sense-amp-enable will be activated if no cache access occurs during the second phase. Another possibility is that a particular SRAM cell is written to. In this case the "Write Control Circuit" 14 of FIG. 1a will drive a "1" on bitline and a "0" on bitline# or vice-versa, depending on the value that is written into the SRAM cell. The appropriate wordline, for the correct set, will be pulsed and the correct value will be written in the SRAM cell for that particular set. The sense-amp-enable signal does not need to be activated for a write cycle. Yet, another possibility is that a particular SRAM cell is read from. The appropriate wordline including that SRAM cell will be pulsed, causing that SRAM cell to drive the bitline and bitline#, using the data stored therein. The sense-amp enable line will be pulsed and the sense-amplifier will sense the voltage difference between bitline and bitline# driving thus the correct logic level at the sense-amp output.
In all the cases mentioned above, whenever a high value is driven on bitline and bitline#, that high-value is always driven through NMOS devices via one of the following: the bitline precharger, the SRAM cell itself, or the write control circuit. Accordingly, the high level on either the bitline and bitline# should not exceed (VCC-V.sub.tn). Henceforth, whenever the term "voltage" is mentioned in the context of the sense-amplifier gain, reference will be made to the "inactive" high-voltage level at which bitline and bitline# are found, when there is no cache access and thus no discharge of the bitlines. This voltage level, theoretically, has to be the same as the precharge-level. As it will be further explained, this is not always the case.
As it is apparent from FIG. 3, which illustrates by way of non-limiting example, a plot of a sense amplifier gain versus the above-mentioned "inactive" high voltage, the sense-amplifier has a high-gain region between 1.3 V and 2.2 V. Outside this region, the gain drops off very dramatically. In the particular case, the VCC was at 2.7 V and the (VCC-V.sub.tn) level was within the high-gain region 1.3 V 2.2 V.
In an "unbanked" cache design, all 128 SDBAs will share the same wordlines [255:0] and the same sense-amp enable lines shown in FIG. 1. Since the sense-amp enable of all 128 SDBAs are connected together, the sense-amplifiers in all SDBAs would be turned on as soon as a CACHE CYCLE signal is executed. Therefore, during any cache read-operation wherein a pulse would be sent through one of the wordlines, all the bitlines (and bitlines #) of all 128 SDBAs would be discharged, thereby causing power to be dissipated in all the bitlines of the cache device.
Part of the power dissipated in an "unbanked" cache, when a read/write operation is issued by the processor, can be saved by using a "cache-banking" scheme. FIG. 4a illustrates a banked cache. The "cache-banking" scheme is premised on the fact that in most cache accesses, the micro-processor data-path (computational logic) does not use all the information that is present in a cache line at one time. For example, according to cache-banking scheme shown in FIG. 3, the 16-byte cache-line can be divided into smaller "chunks", each of which can be accessed as an atomic unit. A cache line 308 can be divided into four 4-byte chunks or into eight 2-byte chunks and so on. The granularity of this division depends on factors like the data path width of the microprocessor and on how much power needs to be saved.
Each cache line (16 bytes) in FIG. 4a is thus divided into four 4-byte (32 bit) chunks. Each such chunk will be called hereinafter a "bank." Banks are accessed as follows:
Access Bank 0.fwdarw.(bytes 0,1,2,3 of cache line).fwdarw.bits [31:0] PA1 Access Bank 1.fwdarw.(bytes 4,5,6,7 of cache line).fwdarw.bits [63:32] PA1 Access Bank 2.fwdarw.(bytes 8,9,10,11 of cache line).fwdarw.bits [95:64] PA1 Access Bank 3.fwdarw.(bytes 12,13,14,15 of cache line).fwdarw.[127:96]
The starting address and the length (amount of data to be read/written) of a cache cycle, will determine which of the cache banks should be accessed during a cycle. In all cache accesses, all 4 bytes in each bank are accessed as a group. More than one bank 1 however, can be accessed simultaneously. For example, if the starting address of an 8-byte read access is at byte 4, Bank 1 and Bank 2 need to be accessed. In a banked cache when a microprocessor's control unit sends out the starting address and the length (in bytes) for each cache access, this information is used to intelligently turn on only those banks that are needed for that cycle. Since all the tag and (least recently used) LRU/valid bits for any cache line need to be accessed for any cache cycle, the tag and LRU/valid arrays will not be banked.
As one can see from FIG. 4a, the data array 308 is divided into four banks as explained above. Each bank of the data array 308 has 256 AND gates 322. Each of these gates 322 receive the following inputs: a line WL.sub.DEC from decoder 318, which corresponds to the respective wordline of the data cache array to which gate 322 is coupled at an output thereof; a wordline pulse coupled to pulse timer 320 for coupling a wordline pulse signal to the gate 322; and a bank select Bank.sub.se1 line coupled to a bank select circuit (not shown) for enabling one or more selected banks of the cache data array 308. When a particular bank is selected, then only the appropriate wordline of that particular bank will be pulsed upon the generation of a wordline pulse signal by the pulse timer 320. As one can see, the other remaining three banks will have their Bank.sub.se1 signal set to 0 logic and, thus, the output of gate 322 coupled to the respective un-selected banks will be 0. According to this scheme, even in the instances where the CACHE CYCLE signal, input to the pulse timer 320, is issued by the processor to initiate a cache access to data array 308, the wordlines of non-selected banks of memory will not be pulsed. Thus, in these banks power will not be dissipated upon the access of one of the SRAM cells of a selected bank.
As one can see from FIG. 4a, an AND gate 324 has an output coupled to the sense amplifiers of each bank. Each Bank.sub.se1 signal is coupled to the gate 324 corresponding to the respective bank. Therefore, the sense amplifier corresponding to each bank will only be turned on if both a "Sense Amp Pulse" signal is issued and a bank select signal Bank.sub.se1 is issued.
For example, for accesses which only require Bank 1 and Bank 2 to be turned on, the bitlines in Bank 0 and Bank 3 will not be discharged and similarly the respective sense amplifiers corresponding to Bank 0 and Bank 3 will not be turned on. In this example, the banked-cache scheme can save about half the power that otherwise would be dissipated in an "unbanked" cache data array. If access is needed to only one bank during a certain cycle, the power saved can be even greater. The power saving can increase as the granularity of banking in the cache line is increased, e.g. a cache with eight 2-byte banks would end up saving more power for certain cycles. Another related benefit of using cache-banking is that this scheme reduces the relative frequency of power surges on the chip since not all banks need to be discharged for every cache cycle. This helps mitigating electron-migration problems in silicon. Power surges (high power dissipation) causes high current densities in the metal lines that route VCC, VSS, and other signals throughout the chip. If the incidence of such high current densities become very frequent, it causes those metal lines to become thinner due to migration of electrons ("electron migration"). The lines may ultimately break, causing functional failure. While the cache banking scheme can significantly reduce the power dissipated in a cache memory, static memories suffer from a problem inherent in the structure of these memories. This problem is called "voltage creep" and will be explained in conjunction with the following discussion.
The "bitline creep" is a behavior of a cache device which can be encountered when the voltage on the bitline and bitline# starts "creeping up" above (VCC-V.sub.tn), due to leakage through the transistors in the precharge circuit and write-enable circuit. For each individual clock cycle, the amount of leakage, or the creep in voltage is minuscule. However, if this leakage is allowed to occur over many hundreds clock cycles, the cumulative creep in bitline and bitline# voltages will take these lines out of the "high-gain" region for the sense-amplifier as shown in FIG. 4b. If this occurs, then on the next read-access to the cache, the sharply reduced gain of the sense-amp will cause a much longer delay in generating the sense-amp output. Thus, the sense-amp will output meaningless data for a longer duration before it will be able to successfully detect the differential between bitline and bitline#, and thus send out the correct output data. This delay in sense-amp output generation, will cause many problems such as incorrect hit/miss detection in the tag-comparison logic and setup-time violations in the logic that uses the cache data output.
However, any read/write access operation to the data memory array before its bitline inactive-voltage "creeps" out of the "high-gain" region of the sense-amp can cause the bitline and bitline# to descend to the normal "VCC-V.sub.tn" level, thereby discharging bitline and bitline#. As mentioned before, for a read access, the SRAM cell will actively drive a high value of (VCC-V.sub.tn) on one of the lines, through the corresponding NMOS transistor in the SRAM cell. For a write access, the write-control-logic will actively drive a high value of (VCC-V.sub.tn) on one of the lines, through the corresponding NMOS transistor in the Write Control Circuit. Accordingly, the voltage "creep" problem occurs only if there is a sufficiently long and continuous series of inactive cycles which will allow the bitline and bitline# voltage to creep beyond the high-gain region of the sense-amp.
FIG. 4b shows a graph of bitline-voltage vs. time during a long period of inactivity. This graph illustrates the "voltage creep" in an SRAM device that does not use a "bitline-discharge" mechanism. FIG. 4b also shows how this "voltage creep" problem is alleviated by using a bitline discharge mechanism.
Such mechanism is disclosed in the patent application of Chang, et al. 08/437,090 issued to Intel Corporation of Santa Clara, Calif., which discloses a memory with a bitline discharge mechanism for unbanked caches which do not have a "stop-clock" mechanism. FIG. 5 illustrates some of the salient features of Chang. The circuit illustrated in this figure is used for generating a wordline pulse to an un-banked cache memory (not shown) to avoid the bit line creep problem. The counter 530 is a simple digital up-counter, with a reset gate for receiving a reset signal. Whenever the Reset signal is asserted, the counter starts counting from 0 up to the "Maximum Count Value." As this figure shows, the counter will be automatically reset once the maximum count value is reached. When the counter reaches the Maximum Count Value, an additional bitline discharge is triggered by having the pulse timer 536 issue a wordline-pulse signal. This additional bitline discharge hereinafter will be denominated as a pseudooperation. If a "RESET" signal, due to assertion of "Cache.sub.-- Cycle", is received before the counter reaches the "Maximum Count Value," the counter will be reset starting to count from 0. This counter-driven discharging mechanism for the bitlines causes additional power dissipation. Furthermore, since all bitlines (i.e. the full cache line in the data array) are discharged at the same time, it causes relatively frequent "power surges." This could result in serious electron migration problems in silicon.
Banked cache memories may additionally use a "stop-clock" mechanism for saving power. In this case, even though the cache may be accessed during a particular cycle, not all the banks and, thus, bitlines in the data array may necessarily be discharged if a bitline discharge mechanism such as the one illustrated in FIG. 5 was used. To understand this problem, let us assume that during 100 clock cycles, Bank1, Bank2, and Bank3 of the banked cache device shown in FIG. 3 are frequently accessed, but Bank0 is not accessed at all during this time. In this case, there will be at least one cache access for every 100 clocks and, thus, the counter of FIG. 5 will be reset each time such that the counter will not reach a count of 100 (Maximum Count Value). We will assume that the bitlines will "creep up" beyond the "high-gain" region of the sense-amplifier, during 100 clocks of inactivity required for triggering a bit line discharge within those clocks, because the counter was reset for every cache access to any bank. This would prevent any pseudooperation bitline discharge from occurring in Bank0, causing the bitlines of Bank 0 to remain inactive for 100 clocks. Accordingly, because Bank 0 was not accessed during 100 consecutive clock cycles, the voltage of bitlines in Bank 0 will creep up beyond the "high-gain" region of the sense-amp. Thus, the next read access to Bank0 may encounter a large delay while the sense-amplifier tries to detect the differential between bitline and bitline#. A speed failure would occur if the Chang reference was used in conjunction with a banked cache.
Moreover, the "stop-clock" feature described above complicates the problem even more. Whenever the clock is stopped, the counter in FIG. 5, which runs on the clock, stops counting. Accordingly, if the counter has been stopped due to a stop-clock signal for a time longer than 100 clocks, enough time would elapse to cause the bitlines to creep up to a voltage which is beyond the "high-gain" region of the sense amplifier. It is, thus, possible that an actual cache read access would be issued to the cache memory before the counter reaches its Maximum Count Value and is able to trigger an additional bit line discharge. In this case, the sense amplifier may cause a substantial delay, thereby causing a failure in the operation of the cache memory.
Accordingly, it is desirable to provide for a banked data storage device with a mechanism for providing bitline discharges such that the creep up problem is avoided. Additionally, it is desirable to provide for a bank data storage device having a stop clock feature for saving power such that the voltage on the bitlines will not creep up in cases where the clock is stopped for a period of time longer than the time that generally takes to the bitline to creep up.