1. Field of the Invention
The present invention is directed generally toward a method and apparatus for protection of data utilizing cyclical redundancy checking.
2. Discussion of Related Art
In a high-performance computer system consisting of multiple processors and mass storage devices, it is of critical importance that all information be stored and retrieved reliably with no errors. It is of equal importance that if errors occur in the storage or retrieval of data, that the errors be detected and reported. Typically, the mass storage of a high-performance computer system consists of a redundant array of independent disks (RAID). Within the RAID mass storage system, data is stored both in semiconductor memory in the RAID controller and on the magnetic media of the RAID disk drives. Though data written to semiconductor memory can be protected using error correction code (ECC) techniques, this will not prevent against inadvertent writes to locations in the memory or reading from incorrect locations. Furthermore, data stored on the disk drives of a RAID system can be stored incorrectly or retrieved incorrectly due to errors in the drives. For example, the drives may have physical problems, data may be stored in the wrong location on the drive, or the data may become corrupted.
The method by which these errors are detected in the system should have minimum impact on the overall system performance. There are several approaches that may be used to protect data from the above-mentioned errors. One method involves the execution of software that checks the integrity of data as it is being stored or retrieved. This method, used to ensure the accuracy of transmitting digital data, is cyclical redundancy checking (CRC). This operation executes concurrently with the transfer of the data. Because this method utilizes a portion of the computing resources for its execution, the overall performance of the system is reduced. This method adds an additional amount of complexity to the software executing in the RAID system.
Another method involves a hardware engine that checks the integrity of data after it has been transferred. Though this method utilizes a small amount of computing resources to initialize and start the hardware engine, system performance is reduced due to the time required to initialize the engine and execute the checking algorithm. If a separate hardware engine is used to perform the CRC function after a transfer to or from system memory is completed, then the next system operation or transfer would have to wait until this CRC operation is completed before executing. This reduces system performance.
The parent patent application provides the addition of a dedicated hardware CRC computation engine to assure the integrity of data transferred between the system memory and storage devices. The CRC computation engine provides CRC calculation xe2x80x9con-the-flyxe2x80x9d for the protection of data transferred to and from the system memory without software overhead. The computation of CRC values and optional checking against previously calculated CRC values is selected through the use of an address-mapping scheme. The CRC protection scheme of the parent application requires a small amount of initial software overhead to allocate the data, CRC value, and CRC error regions of the system memory. After the CRC protection scheme is initialized, all CRC operations are transparent to the executing software.
The parent application further provides a separate cache memory for storing recently utilized CRC values. In the parent application, an exemplary preferred embodiment discloses multiple devices coupled through the memory interface each capable of generating transactions involving CRC values. Where all such CRC values are cached together, it remains a problem to rapidly locate a particular cached CRC value entry. Searching through a single hierarchy of the CRC value cache can negatively impact overall system performance.
It is evident from the above discussion that a need exists for an improved method and structure for locating a cached CRC value entry in the CRC value cache memory.
The present invention solves the above and other problems, thereby advancing the state of the useful arts, by providing structure and methods for reducing overhead processing when locating items in the CRC value cache memory. A first feature provides that multiple (preferably all) CRC values for corresponding multiple sub-blocks of a data block are transferred in a single transaction from system memory to CRC value cache memory. This feature serves to reduce the overhead involved in arbitrating for control of the system memory to retrieve each CRC value individually as requested. A second feature of the invention provides for use of a separate cache table for each source of CRC value access in the storage controller. In particular, an exemplary preferred embodiment of the present invention provides a cache entry table for each of three PCI interface controllers, a cache entry table for a parity assist component of the storage controller and a cache entry table for the DMA controller of the storage controller. Each cache entry table serves to record entries in CRC value cache memory associated with the corresponding device. When a particular device generates a transaction that requires a CRC value, only the cache entry table corresponding to that device is inspected to determine if the entry required is in CRC value cache memory. Cache entry tables corresponding to other devices of the storage controller are not searched for the requested CRC value. This reduction in search processing involved to locate a CRC value cache entry enhances overall system performance.
A first feature of the invention provides that in a system having a system memory containing multiple data blocks each comprising multiple sub-blocks and containing error control values related to each sub-block, a method for caching error control values comprising the steps of: receiving a request to retrieve an identified error control value corresponding to an identified sub-block of an identified data block; determining that the identified error control value is not present in a cache memory; and transferring, in response to the determination that the identified error control value is not present in the cache memory, a set of error control values from the system memory to the cache memory such that the set includes the identified error control value and includes a related error control value.
Another aspect of the invention further provides that the step of transferring comprises the step of: transferring a set of error control values from the system memory to the cache memory such that the set includes the identified error control value and includes multiple related error control values.
Another aspect of the invention further provides that the step of transferring comprises the step of: transferring a set of error control values from the system memory to the cache memory such that the set includes all error control values corresponding to all sub-blocks of the identified data block.
Another aspect of the invention further provides that the error control values are CRC values.
Another aspect of the invention further provides that the step of determining comprises the step of: locating, in a cache table, an entry corresponding to the identified error control value.
Another aspect of the invention further provides that the step of locating includes the step of: determining the presence of the entry using an index number of the identified data block and an index number of the identified sub-block.
Another aspect of the invention further provides that the system includes multiple cache tables such that each cache table includes entries for a corresponding set of devices coupled to the system and such that the devices generate requests to retrieve identified error control values and such that the step of locating includes the step of: selecting the cache table in which to locate the identified error control value in accordance with the identity of a device requesting retrieval of the identified error control value.
A second feature of the invention provides that in a storage controller having system memory containing multiple data blocks each comprising multiple sub-blocks and containing error control values related to each sub-block and having multiple devices that require retrieval of the error control values and having a cache memory for storing copies of selected ones of the error control values, a method for managing the cache memory comprising the steps of: providing multiple cache entry tables such that each cache entry table has at least one entry for identifying an error control value in the cache memory; associating each device of the multiple devices with a cache entry table of the multiple cache entry tables; receiving a request to retrieve an identified error control value from a requesting device of the multiple devices; inspecting only the cache entry table associated with the requesting device to determine whether the identified error control value is present in the cache memory; and transferring, in response to a determination that the identified error control value is not in the cache memory, a set of error control values from the system memory to the cache memory such that the set includes the identified error control value and includes a related error control value.
Another aspect of the invention further provides that the step of transferring comprises the step of: transferring a set of error control values from the system memory to the cache memory such that the set includes the identified error control value and includes multiple related error control values.
Another aspect of the invention further provides that the step of transferring comprises the step of: transferring a set of error control values from the system memory to the cache memory such that the set includes all error control values corresponding to all sub-blocks of the identified data block.
Another aspect of the invention further provides that the error control values are CRC values.
Another aspect of the invention provides that the step of associating comprises the step of: associating each device of the multiple devices with a different cache entry table of the multiple cache entry tables.
Another aspect of the invention provides further provides for the steps of: receiving a request from an updating device of the multiple devices to update an error control value previously transferred to the cache memory; updating the error control value in the cache memory; updating a corresponding entry only in the cache entry table associated with the updating device; and invalidating any corresponding entries in all cache entry tables not associated with the updating device.
Another aspect of the invention further provides for the steps of: receiving a request from an updating device of the multiple devices to update an error control value previously transferred to the cache memory; updating the error control value in the cache memory; and updating a corresponding entry in every cache entry table presently pointing to the error control value.