The present invention is directed, in general, to data processing systems and, more specifically, to a cache system having high performance parity protection.
The demand for high performance computers requires that state-of-the-art microprocessors execute instructions in the minimum amount of time. A number of different approaches have been taken to decrease instruction execution time, thereby increasing processor throughput. One way to increase processor throughput is to use a pipeline architecture in which the processor is divided into separate processing stages that form the pipeline. Instructions are broken down into elemental steps that are executed in different stages in an assembly line fashion.
Superpipelining refers to the simultaneous processing of multiple instructions in the pipeline. For example, if a processor executes each instruction in five stages and each stage requires a single clock cycle to perform its function, then five separate instructions can be processed simultaneously in the pipeline, with the processing of one instruction completed during each clock cycle. Hence, the instruction throughput of an N stage pipelined architecture is, in theory, N times greater than the throughput of a non-pipelined architecture that completes only one instruction every N clock cycles.
Instructions are fed into the instruction pipeline from a cache memory. A cache memory is a small but very fast memory, such as a static random access memory (SRAM), that holds a limited number of instructions and data for use by the processor. The lower the cache access time, the faster the processor can run. Also, the lower the cache miss rate, the less often the processor is stalled while the requested data is retrieved from main memory and the higher the processor throughput is.
It is common practice to provide parity protection for integrated SRAM caches in modern processor designs. Such processors typically contain two levels of cache (L1 and L2 ) integrated onto the same die as the core CPU logic. Presently, the size of integrated L2 caches is typically in the range of 64 KB to 256 KB. Unfortunately, the geometries of modern semiconductor technologies (0.25 micron and below) coupled with the relatively large amount of SRAM integrated on the chip makes integrated L2 caches subject to soft errors caused by spurious radiation (from cosmic-ray alpha particles and the like) and by statistical charge fluctuation in the SRAM cells. Soft errors cause one or more bits in a cache line to be randomly changed from a Logic 0 to a Logic 1 or vice versa. These soft errors can corrupt the data within the cache, which can in turn lead to permanent database corruption and catastrophic program failure.
Therefore, it is desirable to detect (and optionally to correct) soft errors in a given cache line in order to take corrective action before the soft errors can cause damaging program behavior. This is generally accomplished by associating one or more redundant SRAM cells (i.e., parity bits) for each group of data bits in the cache, the group size being chosen according to the degree of protection desired. For each possible value of the data bits in a group, the associated parity bit(s) must have one particular value that is calculated and written into memory at the same time as data is written into memory (on a CPU write transaction). If a soft error causes a change in value of either a parity bit or a data bit, then the value of the parity bits and the value of the data bits become inconsistent, which can be detected (and possibly corrected if the parity bits are used to hold an error-correcting code) when the cache line is read. A soft error detected in this way is often referred to as a xe2x80x9cparity error.xe2x80x9d
By way of example, one of the most common parity schemes generates the parity bit value for a given set of data bits by making the parity bit a Logic 1 if there are an odd number of data bits set to Logic 1 and a Logic 0 if there are an even number of data bits set to Logic 1. Both the data bits and the parity bit are written into the cache on a CPU write transaction. In this example, there will always be an even number of bits set to Logic 1 (including the parity bit), hence this scheme is known as xe2x80x9ceven parity.xe2x80x9d If, at some later time, one of the data bits or the parity bit gets changed from a Logic 1 to a Logic 0, or a Logic 0 to a Logic 1, due to a soft error, there would be an odd number of bits set to Logic 1 (including the parity bit). This would be detected as a parity error when the data and parity bits are later read out the cache.
Unfortunately, the parity bits may significantly increase the size of the cache, depending on the ratio of the number of parity bits to the number of data bits. This can be a major drawback when applied to large on-chip L2 caches, which are already pushing the limits of technology. For instance, if one (1) parity bit is added for each eight (8) data bits, then the die area of the cache is increased by 12.5%, which may significantly increase the likelihood that a soft error occurs (since there are now many more SRAM cells that may fail).
For most non-critical applications of high speed microprocessors, a lesser level of protection is sufficient, such that a parity bit to data bit ratio smaller than 1:8 may be used. Unfortunately, this may conflict with the functional operations of certain caches that require an individual byte-write capability that matches the variable widths (typically 1 to 8 data bytes) of write transactions generated by the CPU. For instance, if one parity bit is used for every pair of data bytes in the data cache, then when the CPU modifies, for example, only the first byte in a byte-pair (on a one-byte write transaction), it would be impossible to calculate the correct new parity bit for the byte-pair without first reading the second byte in the pair from the cache. This would cause a performance penalty by slowing the write transaction. The same argument applies for any combination of parity bits and data bytes in which one parity bit protects more than one data byte.
Therefore, there is a need in the art for improved cache memories that maximize processor throughput. In particular, there is a need in the art for improved cache memories having parity protection in which one parity bit protects more that one data byte. More particularly, there is a need for a cache memory having a byte-write capability that uses a parity protection apparatus in which one parity bit protects more that one data byte without slowing down the operation of the cache memory.
To address the above-discussed deficiencies of the prior art, it is a primary object of the present invention to provide an improved cache memory for use in a data processor. According to an advantageous embodiment of the present invention, the cache memory comprises: 1) a first static random access memory (SRAM) capable of receiving on a plurality of inputs up to N incoming bytes of data and storing the up to N incoming bytes of data in a plurality of N-byte addressable locations, wherein M incoming bytes of data may be written in each of the plurality of N-byte addressable locations during a write operation, and wherein M written bytes of data and Nxe2x88x92M unwritten bytes of data are output from each N-byte addressable location on a plurality of outputs of the first SRAM during the write operation; and 2) a parity generator coupled to the first SRAM capable of receiving during the write operation the M written bytes of data and the Nxe2x88x92M unwritten bytes of data and generating therefrom at least one write parity bit associated with the M written bytes of data and the Nxe2x88x92M unwritten bytes of data.
According to one embodiment of the present invention, the cache memory further comprises a second SRAM coupled to the parity generator capable of storing the at least one write parity bit during the write operation.
According to another embodiment of the present invention, the first SRAM receives R write enable signals capable of selecting at least one of N bytes in each of the plurality of N-byte addressable locations in which incoming bytes of data are to be written.
According to still another embodiment of the present invention, R=N such that a single incoming byte of data may be written into an individual selectable one of the N bytes in each of the plurality of N-byte addressable locations.
According to yet another embodiment of the present invention, the at least one write parity bit comprises a single parity bit associated with the M written bytes of data and the Nxe2x88x92M unwritten bytes of data.
According to a further embodiment of the present invention, the at least one write parity bit comprises a first parity bit associated with a first one of N bytes in each of the plurality of N-byte addressable locations and a second parity bit associated with a second one of the N bytes in each of the plurality of N-byte addressable locations.
According to a still further embodiment of the present invention, the first SRAM is capable of receiving during a read operation a read address selecting a first one of the plurality of N-byte addressable locations, wherein the first SRAM, in response to receipt of the read address, outputs on the plurality of outputs N bytes of data retrieved from the first N-byte addressable location, and wherein the parity generator generates at least one read parity bit associated with the retrieved N bytes of data.
According to a yet further embodiment of the present invention, the cache memory further comprises a parity error detector, wherein the second SRAM receives the read address and, in response to receipt of the read address, outputs the at least one write parity bit associated with the read address, and wherein the parity detector compares the at least one read parity bit and the at least one write parity bit.
The foregoing has outlined rather broadly the features and technical advantages of the present invention so that those skilled in the art may better understand the detailed description of the invention that follows. Additional features and advantages of the invention will be described hereinafter that form the subject of the claims of the invention. Those skilled in the art should appreciate that they may readily use the conception and the specific embodiment disclosed as a basis for modifying or designing other structures for carrying out the same purposes of the present invention. Those skilled in the art should also realize that such equivalent constructions do not depart from the spirit and scope of the invention in its broadest form.
Before undertaking the DETAILED DESCRIPTION OF THE INVENTION below, it may be advantageous to set forth definitions of certain words and phrases used throughout this patent document: the terms xe2x80x9cincludexe2x80x9d and xe2x80x9ccomprise,xe2x80x9d as well as derivatives thereof, mean inclusion without limitation; the term xe2x80x9cor,xe2x80x9d is inclusive, meaning and/or; the phrases xe2x80x9cassociated withxe2x80x9d and xe2x80x9cassociated therewith,xe2x80x9d as well as derivatives thereof, may mean to include, be included within, interconnect with, contain, be contained within, connect to or with, couple to or with, be communicable with, cooperate with, interleave, juxtapose, be proximate to, be bound to or with, have, have a property of, or the like; and the term xe2x80x9ccontrollerxe2x80x9d means any device, system or part thereof that controls at least one operation, such a device may be implemented in hardware, firmware or software, or some combination of at least two of the same. It should be noted that the functionality associated with any particular controller may be centralized or distributed, whether locally or remotely. Definitions for certain words and phrases are provided throughout this patent document, those of ordinary skill in the art should understand that in many, if not most instances, such definitions apply to prior, as well as future uses of such defined words and phrases.