1. Field of the Invention
The present method generally relates to the protection from inadvertent loss of data on digital storage system. Specifically the method describes software running on the CPU of a computer system.
2. Background of the Related Art
The operation of computers is very well known in the art. File systems exist on a computer or across multiple computers, where each computer typically includes data storage, such as a hard disk or disk(s), random access memory (RAM) and an operating system for executing software code. Software code is typically executed to carry out the purpose of the computer. As part of the execution of the computer code, storage space on the hard disk or disks and RAM are commonly used. Also, data can be stored, either permanently or temporarily on the hard disk or disks and in RAM. The structure and operation of computers are so well known in the art that they need not be discussed in further detail herein.
In the field of computers and computing, file systems are also very well known in the art to enable the storage of data as part of the use of the computer. A computer file system is a method for storing and organizing computer files and the data they contain to make it easy to find and access the data. File systems may use data storage devices, such as hard disks or solid state devices, for storing data. Data storage devices involve maintaining the physical location of the files, and these devices might include more sophisticated features such as providing access to data by the computer operating system or on a file server by acting as clients for a network protocol (e.g., NFS, SMB, or FTP clients). Also, they may be virtual devices and exist only as an access method for virtual data.
As any physical device is subject to failure methods have been developed which seek to reduce the probability of data loss should one or more physical storage devices (e.g. hard drives or solid state disks) fail.
A common method of data protection is referred to as RAID (Redundant Array of Independent Disks). RAID methods vary in their details but most have been designed to prevent data loss by creating copies of a data element on more than one physical device.
RAID systems create the redundant copies in a manner which is transparent to the software application utilizing this data. For example a database application will access a RAID protected collection of storage elements (hard disks for solid state drives) as if they were a single storage device. The common feature of all RAID systems is that the RAID elements, or individual disks, are never separately addressable by an application using the data stored on the RAID system. This access is virtualized.
RAID methods have been well studied and classified. The type of data protection employed is described using a set of well accepted terms. For example, the most basic type of RAID, RAID-1 is a method where two storage devices each of the same physical capacity are virtualized to appear as a single storage device. In this case data is protected using a method called data mirroring. Data mirroring is the process by which each data block on the RAID-1 device is duplicated on both of the disks used to create the assemblage. The result is that if one device fails the data can still be accessed on the remaining device.
A second very common method of RAID protection involves the use of three or more drives in a RAID array and is commonly called a parity RAID method. Specifically a mathematical transform is used to compute codes which can be used to reconstruct a data block should one or more of the storage devices fail. This method is commonly employed because it increases the amount of storage capacity available to applications using the RAID array.
An example of parity RAID referred to as RAID5 or single parity RAID utilizes one extra storage device in each RAID array to protect the data. In a 3 disk RAID5 (the smallest possible configuration) the usable capacity of the array with equally-sized disks (of size N) would be 2*N. In general a single parity array with M elements of size N will have capacity=(M−1)*N.
This RAID method has commonly been extended to utilize more that one parity drives. These methods are commonly referred to as double or even triple parity RAID methods. The capacity of a parity based method can be computed as follows: capacity=(M−P)*N where P is the parity cardinality (e.g. 1, 2, 3 etc.) M is the number of total drives, where M=(at least) P+2 and N is the size of each RAID element (e.g. disk drive or solid state device)
While RAID reduces the probability of data-loss, it incurs a significant data access performance penalty. Because each element in a RAID stores only part of the total data block each element must be accessed in a synchronized manner in order to read or write data from the RAID array.
Because all elements must be addressed together, the number of read or write operations that can be performed per second (common called IOs per second or IOPs) will be no greater than that of a single element in the array. Therefore the larger the RAID array the greater the IOP penalty becomes.
The common trade-off that must be made is therefore one of capacity versus performance. An array of non-RAID protected storage elements would provide highest IOPS at the expense of no data protection. A mirror would provide the best performance but would reduce capacity by ½. A 5 element single parity RAID would provide ⅘ of the potential capacity but only ⅕ of the potential performance.
Accordingly, there is a need in the industry for a method of improving the performance of data access in data storage systems, yet preserve the redundancy essential to protecting data stored in these data storage systems.