A typical data storage system stores and retrieves data for one or more external hosts. It is common for such a data storage system to include front-end circuitry, a cache, back-end circuitry, and a set of disk drives. In general, the cache operates as a buffer for data exchanged between the external hosts and the disk drives. The front-end circuitry operates as an interface for transferring data from the hosts to the cache, and vice versa. Similarly, the back-end circuitry operates as an interface for transferring data from the cache to the disk drives, and vice versa.
Some data storage systems are capable of storing and retrieving data having a count-key-data (CKD) record format (hereinafter referred to as CKD data). Such data consists of a count field containing the number of bytes of data, an optional key field by which particular records can be easily recognized, and the data itself. In general, CKD data does not have a standard size. That is, CKD data does not arrive in complete blocks, i.e., consistently aligned with a block or sector boundary. Rather, CKD data is arbitrary in size, varying from transmission to transmission.
Some data storage systems, which are equipped to handle CKD data, associate cyclic redundancy check (CRC) codes with the CKD data for fault tolerance purposes. In one conventional data storage system, when the front-end circuitry receives CKD data for storage, the front-end circuitry associates a CRC code with the CKD data and provides the CKD data and the associated CRC code to the cache. The back-end circuitry then reads the CKD data and the associated CRC code out of the cache, confirms that the CKD data is not corrupt or garbled based on the CRC code, and stores the CKD data and the CRC code on the set of disk drives.
It should be understood that data transfers between components of the above-described conventional data storage system (i.e., between the front-end circuitry and the cache, between the cache and the back-end circuitry, etc.) occur in block-sized or block-aligned operations. The front-end circuitry typically handles conversion of the non-standard-sized CKD data to data blocks. In particular, in response to CKD received from a host for storage, the front-end circuitry provides, to the cache, a block of data including (i) the CKD data, (ii) an associated CRC code appended to an end of the CKD data, and (iii) old, invalid data remaining in the front-end circuitry for alignment with a block boundary (e.g., a 512 byte boundary). It should be understood that the CRC code applies only to the CKD data, and not to the old, invalid data. Furthermore, in a separate signal (e.g., a message to the back-end circuitry), the front-end circuitry identifies the number of bytes of CKD data in the data block so that the back-end circuitry can use that number as an offset to find the CRC code.
As explained earlier, when the back-end circuitry retrieves the block of data from the cache, the back-end circuitry checks the CRC code to confirm that the CKD data is still intact, i.e., verifies that the CKD data is not corrupt or garbled in some manner. To this end, the back-end circuitry generates (i) a second CRC code based on the entire data block, (ii) a third CRC code based only on the old, invalid data, and (iii) an expected value for the initial CRC code for the CKD data (i.e., the CRC code appended to the CKD data within the data block) based on the second and third CRC codes (e.g., by performing an exclusive OR operation on the second and third CRC codes). The back-end circuitry then compares the expected value with the initial CRC code. If there is a match, the back-end circuitry concludes that the CKD data is without error and stores the CKD data (and perhaps the initial CRC code as well) in the disk drives. However, if the generated expected value does not match the initial CRC code, the back-end circuitry concludes that the CKD data includes an error (i.e., that one or more bits of the CKD data is incorrect), and initiates an error handling procedure (e.g., notifies the front-end circuitry that the CKD data includes an error and invites the front-end circuitry to retransmit the CKD data).
Unfortunately, there are deficiencies to the above-identified conventional data storage system which stores CKD data by including old, invalid data with the CKD data for block alignment purposes. For example, for the back-end circuitry of the above-described conventional data storage system to confirm that the CKD data from the front-end circuitry is not corrupt, the back-end circuitry performs a complex series of operations. In particular, the back-end circuitry generates (i) a second CRC code based on the entire data block containing the CKD data, (ii) a third CRC code based only on the old, invalid data in the data block, and (iii) an expected result based on the second and third CRC codes. The back-end circuitry then compares the expected value with the initial CRC code (i.e., the CRC code appended to the CKD data within the data block) to determine whether the CKD data is corrupt. This complex series of operations, which is typically implemented in software, requires a significant amount of time to complete. As a result, the transfer of CKD data through the back-end circuitry tends to be relatively slow from a performance standpoint compared to transfer times of other types of data due to the large amount of error checking overhead performed by the back-end circuitry.
In contrast to the above-described conventional data storage system, the invention is directed to data storage techniques that include an error detection code and cleared bytes (e.g., zeroes) with certain types of data (e.g., CKD data). The use of cleared bytes with CKD data alleviates the need to perform a complex series of software operations at the back-end to detect corrupted CKD data. Rather, when the CKD data is followed by an appended CRC code and cleared bytes to form an aligned block of data, error checking of the CKD data (and the entire data block) can simply involve generating a CRC code based on the entire data block and comparing that generated CRC code with the initial CRC code appended to the CKD data within that data block. Accordingly, the error detection process is relatively simpler and takes less time than the above-described conventional approach.
One arrangement of the invention is directed to a data storage system that includes a circuit (e.g., a front-end interface) having a memory pipeline that (i) receives a stream of data elements (e.g., CKD data), and (ii) provides a series of byte groups that includes the stream of data elements, an error detection code (e.g., a CRC code) and a set of cleared bytes (e.g., zeroes) to a set of storage devices. The circuit further includes a controller, coupled to the memory pipeline, that provides the error detection code and the set of cleared bytes to the memory pipeline such that each of the series of byte groups provided by the memory pipeline has a same byte width (e.g., eight bytes). The inclusion of the error detection code and the set of cleared bytes enables consistent alignment of each byte group in the series. Furthermore, if the series of byte groups is loaded into an initialized memory sector (e.g., a cleared cache of the data storage system), a CRC code can be (i) generated based on the entire sector and (ii) compared to the CRC code within the series of byte groups to determine whether the stream of data elements is without error.
In one arrangement, the memory pipeline includes an output stage that connects to an external memory, and the controller is configured to direct the memory pipeline to further provide a set of subsequent byte groups exclusively having cleared bytes. In this arrangement, the output stage provides both the series of byte groups and the set of subsequent byte groups to the external memory to exactly fill a sector (e.g., 512 bytes) of an external memory (e.g., cache memory, dual-ported random access memory leading to the cache memory, a disk drive, etc.) with the series of byte groups and the set of subsequent byte groups. When a back-end interface receives the sector, the back-end interface can operate on the entire sector. For example, if the error detection code within the series of byte groups is a CRC code, the back-end interface can generate another CRC code on the entire block, and simply compare the CRC code within the series of byte groups to the generated CRC code to determine whether the data within the series of byte groups is error free.
In one arrangement, the memory pipeline includes an input stage that receives the stream of data elements, an output stage that provides the series of byte groups, and an assembly stage interconnected between the input and output stages. The assembly stage has a set of registers. The controller is configured to (i) load a data element from the stream of data elements into each of the set of registers sequentially until all of the registers in the set of registers contain a respective data element thus forming a byte group, and (ii) effectuate transfer of that byte group from the assembly stage to the output stage. The controller is further configured to reset any remaining registers after a last data element at an end of the stream of data elements loads into the set of registers. Preferably, the controller is further configured to load the error detection code into one of the remaining registers after the controller resets the remaining registers. In one arrangement, the controller is configured to load the error detection code into the next register location following the last data element at an end of the stream of data elements. Once the set of registers is provided with the error detection code and cleared remaining bytes, the contents of the set of registers can be provided through the output stage as one of the series of byte groups.
In one arrangement, the controller includes an input that receives a count signal indicating the number of data elements in the stream of data elements. The controller is configured to reset the remaining registers based on the number of data elements in the stream of data elements. When the stream of data elements is CKD data, each data element can be a byte of data, and the count signal can include the number of bytes of data from the count field of the CKD data.
In one arrangement, the controller is further configured to (i) transfer a byte group having the error detection code to the output stage, and (ii) subsequently reset all of the registers of the set of the registers of the assembly stage to provide a set of subsequent byte groups exclusively having cleared bytes. As such, the output stage provides both the series of byte groups and the set of subsequent byte groups to an external memory to exactly fill a sector of the external memory with the series of byte groups and the set of subsequent byte groups. Accordingly, the controller can generate the set of subsequent byte groups by simply clearing each register of the set of registers (e.g., asserting the reset lines of the registers) for a particular number of clock cycles so that the series of byte groups and the set of subsequent cleared byte groups fill a complete sector.
The features of the invention, as described above, may be employed in data storage systems, devices and methods, as well as other computer-related components such as those manufactured by EMC Corporation of Hopkinton, Mass.