Current large scale (for the most part) data storage systems use storage volumes, most commonly hard disk drives (HDDs), on a standalone basis. That is, when data is transferred to one of the storage volumes, that storage volume stores the data in the same manner it would if it were a volume on, for instance, a personal computer. Accordingly, to achieve more desirable performance characteristics, such as higher access rates, better error rates, and better failure protection, data storage systems resort to using technologies like redundant array of independent disks (RAID) to write data to multiple volumes simultaneously and add parities to data in order to increase the speed in which data is stored while being able to recover the data should one of the storage volumes fail or otherwise cause errors in the data. As the capacities of individual volumes increases, the possibility of encountering an unrecoverable error when attempting to access stored data increases. It therefore becomes more difficult to achieve desirable error rates while keeping monetary costs per gigabyte of data low using traditional methods of storing data across multiple storage volumes
Overview
Embodiments disclosed herein provide systems, methods, and computer readable media for storing data to a plurality of physical storage volumes. In a particular embodiment, a method provides identifying first data for storage on the plurality of physical storage volumes. Each of the plurality of storage volumes corresponds to respective ones of a plurality of data channels. The method further provides segmenting the first data into a plurality of data segments corresponding to respective ones of the plurality of data channels and transferring the plurality of data segments as respective bit streams over the respective ones of the plurality of data channels to the respective ones of the plurality of physical storage volumes. The plurality of storage volumes stores the respective bit streams in the exact condition in which the bit streams are received.
In some embodiments, the plurality of data segments comprises a plurality of files. In those embodiments, the method provides segmenting the first data comprises splitting the first data into a plurality of subsets and splitting each subset of the plurality of subsets into a plurality of code words contained in the plurality of files if error correcting code (ECC) is used for data protection.
In some embodiments, splitting each subset of the plurality of subsets into the plurality of code words comprises orthogonally distributing symbols from each subset of the plurality of subsets into the plurality of code words to form the plurality of files.
In some embodiments, the method provides individually protecting each code word of the plurality of code words using a C1 Reed-Solomon error-correcting code and interleaving the plurality of code words.
In some embodiments, the method provides encoding each code word of the plurality of code words using a C2 Reed-Solomon error-correcting code.
In some embodiments, the number of subsets in the plurality of subsets is equal to twice the number of data channels in the plurality of data channels.
In some embodiments, the method provides performing at least one of cyclic redundancy check (CRC), encryption, or compression on the first data.
In some embodiments, each of the plurality of channels comprises a Serial ATA (SATA) channel between the storage subsystem and one of the plurality of physical storage volumes.
In some embodiments, the plurality of physical storage volumes comprises a plurality of hard disk drives.
In some embodiments, each hard disk drive of the plurality of hard disk drives only writes control and timing bits in addition to writing the bit stream.
In another embodiment, a system is provided including one or more computer readable storage media and a processing system operatively coupled with the one or more computer readable storage media. Program instructions stored on the one or more computer readable storage media, when read and executed by the processing system, direct the processing system to identify first data for storage on the plurality of physical storage volumes. Each of the plurality of storage volumes corresponds to respective ones of a plurality of data channels. The program instructions further direct the processing system to segment the first data into a plurality of data segments corresponding to respective ones of the plurality of data channels and transfer the plurality of data segments as respective bit streams over the respective ones of the plurality of data channels to the respective ones of the plurality of physical storage volumes. The plurality of storage volumes stores the respective bit streams in the exact condition in which the bit streams are received.