1. Field of the Invention
This invention relates to error detection in storage systems.
2. Description of the Related Art
Many storage arrays provide protection against data loss by storing redundant data. Such redundant data may include parity information (e.g., in systems using striping) or additional copies of data (e.g., in systems providing mirroring). A storage system""s ability to reconstruct lost data may depend on how many failures occur before the attempted reconstruction. For example, some RAID (Redundant Array of Independent/Inexpensive Disks) systems may only be able to tolerate a single disk failure or error. Once a single disk fails or loses data through an error, such systems are said to be operating in a degraded mode because if additional disks fail before the lost data on the failed or erroneous disk has been reconstructed, it may no longer be possible to reconstruct the lost data. The longer a storage array operates in a degraded mode, the more likely it is that an additional failure will occur. As a result, it is desirable to detect and repair disk failures or other anomalies so that a storage array is not operating in a degraded mode.
Errors that may cause a storage system to operate in a degraded mode include transmission errors, total disk failures, and disk errors. Transmission and disk errors may cause less data vulnerability or data loss than failures, but they may be more difficult to detect. For example, disk drives may occasionally corrupt data, and this corruption may not be detected by the storage system until the data is read from the disk. The corruptions may occur for various different reasons. For example, bugs in a disk drive controller""s firmware may cause bits in a sector to be modified or may cause blocks to be written to the wrong address. Such bugs may cause storage drives to write the wrong data, to write the correct data to the wrong place, or to not write any data at all. Another source of errors may be a drive""s write cache. Many disk drives use write caches to quickly accept write requests so that the host or array controller can continue with other commands. The data is later copied from the write cache to the disk media. However, write cache errors may cause some acknowledged writes to never reach the disk media. The end result of such bugs or errors is that the data at a given block may be corrupted or stale. Errors such as drive errors and transmission errors may be xe2x80x9csilentxe2x80x9d in the sense that no error messages are generated when such errors occur.
In general, it is desirable to detect errors soon after they occur so that a storage system is not operating in a degraded mode for an extended time. However, error detection mechanisms are often expensive to implement (e.g., if they require a user to purchase additional or more expensive hardware and/or software) and/or have a detrimental impact on storage system performance. Thus, it is desirable to allow users to select whether to purchase the error detection mechanism independently of the overall system and/or to allow users to be able to independently enable and disable the error detection mechanism.
Various embodiments of a method and system for sharing a cache are disclosed. In one embodiment, a processing device includes a shared cache, a plurality of processors that are each coupled to the shared cache and each configured to store a result in the shared cache. The processors generate their results by performing the same data integrity operation (e.g., a parity calculation) on the same data. The shared cache may be included on a same semiconductor substrate as a first processor. Because the results are stored in the shared cache, the first processor may quickly access and operate on the results. In one embodiment, the first processor may perform a comparison operation or voting operation on the results stored in the shared cache.
In one embodiment, the shared cache may be multi-ported and each of the shared cache""s ports may correspond to a respective one of the processors. Each processor may have a dedicated connection between itself and a respective one of the shared cache""s ports. In other embodiments, the processors may be coupled to the shared cache by a bus.
In some embodiments, the shared cache may be the first processor""s L1 (level 1) cache. The plurality of processors may be integrated onto the same semiconductor substrate as the first processor. In some embodiments, the first processor may not be included in the plurality of processors that are each storing a result in the shared cache.
In several embodiments, each of the plurality of processors may include its own cache, and each of the plurality of processors may be configured to operate on data and instructions stored in its own cache in order to generate the result. In an alternative embodiment, each of the plurality of processors may be configured to operate on data and instructions stored in the shared cache in order to generate the result. In one embodiment, each of the plurality of processors may only be able to access the shared cache when in a first mode (e.g., a data integrity mode)
In one embodiment, the processing device may be included in a data processing system that includes a host system, and interconnect, and a storage array.
In one embodiment, a method of sharing a cache between multiple processors involves a plurality of processors each performing the same data integrity operation on the same data to generate a result, the plurality of processors storing their results in the shared cache, and the first processor accessing the results in the shared cache.
In one embodiment, a processing device may include a plurality of means for processing data (e.g., processors such as those shown in FIGS. 10-12) and means for storing data (e.g., a shared cache like those shown in FIGS. 10-12). The means for storing data may be integrated on the same semiconductor substrate as at least one of the means for processing data. Each of the means for processing data is coupled to the means for storing data and configured to store a result in the means for storing data. Each of the means for processing data may generate its result by performing the same data integrity operation on the same data as each of the other means for processing data.