1. Field of the Invention
The present invention relates, in general, to network data storage, and, more particularly, to software, systems and methods for high availability, high reliability data storage using parity data protection having an arbitrary dimensionality.
2. Relevant Background
Economic, political, and social power are increasingly managed by data. Transactions and wealth are represented by data. Political power is analyzed and modified based on data. Human interactions and relationships are defined by data exchanges. Hence, the efficient distribution, storage, and management of data is expected to play an increasingly vital role in human society.
The quantity of data that must be managed, in the form of computer programs, databases, files, and the like, increases exponentially. As computer processing power increases, operating system and application software becomes larger. Moreover, the desire to access larger data sets such as those comprising multimedia files and large databases further increases the quantity of data that is managed. This increasingly large data load must be transported between computing devices and stored in an accessible fashion. The exponential growth rate of data is expected to outpace improvements in communication bandwidth and storage capacity, making the need to handle data management tasks using conventional methods even more urgent.
High reliability and high availability are increasingly important characteristics of data storage systems as data users become increasingly intolerant of lost, damaged, and unavailable data. Data storage mechanisms ranging from volatile random access memory (RAM), non-volatile RAM, to magnetic hard disk and tape storage, as well as others, are subject to component failure. Moreover, the communication systems that link users to the storage mechanisms are subject to failure, making the data stored behind the systems temporarily or permanently unavailable. Varying levels of reliability and availability are achieved by techniques generally referred to as xe2x80x9cparityxe2x80x9d.
Parity storage, as used herein, refers to a variety of techniques that are utilized to store redundant information, error correcting code (ECC), and/or actual parity information (collectively referred to as xe2x80x9cparity informationxe2x80x9d) in addition to primary data (i.e., the data set to be protected). The parity information is used to access or reconstruct primary data when the storage devices in which the primary data is held fail or become unavailable.
Parity may be implemented within single storage devices, such as a hard disk, to allow recovery of data in the event a portion of the device fails. For example, when a sector of a hard disk fails, parity enables the information stored in the failed sector to be recreated and stored at a non-failed sector. Some RAM implementations use ECC to correct memory contents as they are written and read from memory.
Redundant array of independent disks (RAID) technology has developed in recent years as a means for improving storage reliability and availability. The concept, as initially conceived, contemplated the clustering of small inexpensive hard disks into an array such that the array would appear to the system as a single large disk. Simple arrays, however, actually reduced the reliability of the system to that of the weakest member. In response, a variety of methods (i.e., RAID technology) for storing data throughout the array in manners that provided of redundancy and/or parity were developed to provide varying levels of data protection.
Conventional RAID (redundant array of independent disks) systems provide a way to store the same data in different places (thus, redundantly) on multiple storage devices such as hard disk drives. By placing data on multiple disks, input/output (I/O) operations can overlap in a balanced way, distributing the load across disks in the array and thereby improving performance. Since using multiple disks in this manner increases the mean time between failure (MTBF) for the system as a whole with respect to data availability, storing data redundantly also increases fault-tolerance. A RAID system relies on a hardware or software controller to hide the complexities of the actual data management so that RAID systems appear to an operating system to be a single logical volume. However, RAID systems are difficult to scale because of physical limitations on the cabling and controllers. Also, RAID systems are highly dependent on the controllers so that when a controller fails, the data stored behind the controller becomes unavailable. Moreover, RAID systems require specialized, rather than commodity hardware, and so tend to be expensive solutions.
RAID solutions are also relatively expensive to maintain, as well as difficult and time consuming to properly configure. RAID systems are designed to enable recreation of data on a failed disk or controller but the failed disk must be replaced to restore high availability and high reliability functionality. Until replacement occurs, the system is vulnerable to additional device failures. Condition of the system hardware must be continually monitored and maintenance performed as needed to maintain functionality. Hence, RAID systems must be physically situated so that they are accessible to trained technicians who can perform required maintenance. Not only are the man-hours required to configure and maintain a RAID system expensive, but since most data losses are due to human error, the requirement for continual human monitoring and intervention decreases the overall reliability of such a system. This limitation also makes it difficult to set up a RAID system at a remote location or in a foreign country where suitable technicians would have to be found and/or transported to the locale in which the RAID equipment is installed to perform maintenance functions.
RAID systems (levels 0-5) cannot be expanded in minimal increments (e.g. adding a single storage element) while the system is in operation. The addition of a storage element requires that the entire system be brought down, parity recalculated, and then data restored. Hence, expanding the capacity addressed by RAID systems may result in data unavailability for indefinite amounts of time.
Moreover, RAID systems cannot scope levels of parity protection differently for arbitrarily small subsets of data within the overall data set protected. A RAID controller is configured to provide one type of parity protection at a time on a fixed, known set of storage devices. However, different types of data have very different and highly varied protection requirements. Mission critical data may need an extremely high level of protection, whereas data such as program files and seldom used documents may need little or no protection at all. Currently, users must either implement multiple systems to provide varying levels of protection to different types of data, or compromise their data protection needs by either paying too much to protect non-critical data, or by providing less than desired protection for critical data.
Current RAID systems do not provide a practical method by which parity data can be used not only to reconstruct primary data but also to serve data requests in lieu of or in addition to serving those requests directly from the primary data itself. With the exception of mirrored data protection systems, parity information is generally used in the event of a catastrophe to serve requests for lost data only while the primary data is being reconstructed from this parity information. After reconstruction of the primary data, data is once again served from the reconstructed primary only, not the parity information. This increases the effective overhead cost of parity data, as parity information is only passively stored by the storage system rather than actively being used to improve performance during normal operation.
NAS (network-attached storage) refers to hard disk storage that is set up with its own network address rather than being attached to an application server. File requests are mapped to the NAS file server rather than being routed through an application server device. NAS may perform I/O operations using RAID internally (i.e., within a NAS node). NAS may also automate mirroring of data to one or more other NAS devices to further improve fault tolerance. This mirroring may be done synchronously or asynchronously, but in both cases network limitations provide range restrictions on geographic separation. Because NAS devices can be added to a network, they may enable some scaling of the aggregate network storage capacity by adding additional NAS nodes. However, NAS devices are constrained in RAID applications to the abilities provided by conventional hardware and software based RAID controllers. NAS systems do not generally enable mirroring and parity across nodes, and so any single point of failure at a typical NAS node makes all of the data stored at that NAS node unavailable. RAID systems are not designed to provide efficient, redundant, and fault tolerant data storage in distributed network data storage environments.
In general, current parity protection systems provide one-dimensional parity protection, with some systems providing up to two-dimensional parity protection. One-dimensional parity protection means that one set of parity information is created and maintained for a given primary data set. Hence, the system is vulnerable to simultaneous failure of primary data storage and the associated parity data storage. RAID level 6 provides two-dimensional parity using two independent, distributed parity groups. However, there remains a need for systems and methods for efficiently providing greater dimensions, and preferably arbitrarily large dimensions of parity protection.
Philosophically, the way data is conventionally managed is inconsistent with the hardware devices and infrastructures that have been developed to manipulate and transport data. For example, computers are characteristically general-purpose machines that are readily programmed to perform a virtually unlimited variety of functions. In large part, however, computers are loaded with a fixed, slowly changing set of data that limits their general-purpose nature to make the machines special-purpose. Advances in processing speed, peripheral performance and data storage capacity are most dramatic in commodity computers and computer components. Yet many data storage solutions cannot take advantage of these advances because they are constrained rather than extended by the storage controllers upon which they are based. Similarly, the Internet was developed as a fault tolerant, multi-path interconnection. However, network resources are conventionally implemented in specific network nodes such that failure of the node makes the resource unavailable despite the fault-tolerance of the network to which the node is connected. Continuing needs exist for highly available, highly reliable, and highly scaleable data storage solutions.
Briefly stated, the present invention involves a data storage system implementing an N-dimensional parity paradigm. A system for parity distribution is preferably implemented in a distributed network storage environment, but may also be implemented in a conventional storage array or a single storage device environment. A mechanism for the dynamic addition and subtraction of storage elements as well as the capability to dynamically modify the degree of redundancy protection enjoyed by individual data elements and sets of elements in an arbitrary way is provided.
In another aspect, the present invention involves a method for data protection with an arbitrary number of parity dimensions in which a data element is selected for entry and a degree of fault tolerance desired for that data element is determined. A number of non-intersecting parity groups (i.e., where no two members of a single parity group reside on the same physical device) are associated with the primary data element from an arbitrarily large pool of available storage locations which reside on an arbitrary number of physical storage devices. A location for the primary data element to be stored is selected based on user-specified or system-specified metrics. The data element is written to its primary location and the parity elements associated with the previously chosen parity groups are updated. Once the primary write operation and associated parity updates are confirmed, the data entry transaction is finalized. System read operations either read the data element directly from its primary location or read an image of the data element reconstructed from one or more of its associated parity groups. The criteria on which this choice is based are arbitrary, but generally performance related. The process by which primary data elements and the parity elements associated with the logical parity groups to which the primary data belongs are maintained, migrated, and reconstructed due to network, server, disk, and human error is preferably automated and fully dynamic.