Throughout the history of data storage the size of a storage solution has grown. Computers first stored data at the byte level, then at the disk level. The capacity of disks has grown from hundreds of kilobytes to megabytes to gigabytes and will continue grow. As computing environments have grown, so has the environment's demand for yet larger storage solutions. At each stage of growth the atomic unit of the storage solution has also grown from individual disks to multiple disks to complete systems comprising storage farms that include large arrays of numerous disks.
In the world of data storage, RAID stands for “Redundant Array of Inexpensive Disks.” Nothing could be further from the truth due the high cost to implement a traditional RAID storage array that meets criteria for a solid solution. Each storage array comprises a set of array parameters that fits the desired criteria where array parameters include metrics based on cost, reliability, performance, capacity, availability, scalability, or other values important to a customer. Typically RAID systems require specialized hardware including SCSI disks, iSCSI equipment, or Fibre Channel switches forcing consumers to pay a large premium to achieve their desired criteria for a solution. High costs place storage array solutions well beyond the reach of consumers and small to medium businesses (SMB). Enterprises, where reliability or performance far out weigh cost, can afford an effective solution.
RAID systems and their associated hardware offer customers a very coarse grained approach to storage solutions. Each RAID level, RAID-0, 1, 0+1, 10, 5, 53, and so on, offers one specific configuration of disks handled by a controller or complex software. Such coarse grained approaches map data to physical locations via a storage map at the disk level or worse yet at the system level. Consequently, these systems have a single fixed topology as defined by their storage maps which govern how data sets contained on the array's disks relate to each other. In addition, each system has a specific set of storage array parameters associated with them. For example, RAID-0 striping offers performance determined by the number of disks in the array but does not offer improved reliability through redundant data. RAID-1 offers reliability through data redundancy on multiple disks but does not offer performance gains. This list continues for each RAID level. Once customers deploy a RAID system, they suffer a great deal of pain migrating to a new system that more closely matches their criteria for a solution. Customers have no easy method of altering an array's parameters to fine tune their solution after the array has been deployed.
Storage systems with a fixed topology, coarse grained storage maps, and specific array parameters force customers to decide a priori exactly what their desired criteria are for a solution. Once the customer determines the criteria for an array's parameters the customer must purchase a storage solution that best matches the criteria, forcing the customer to purchase “up to” the RAID level that best fits the solution criteria and hope that it fits any future needs as well. So, the array cost is high because customers must pursue fixed topology solutions at the system level where controllers govern the system rather than at a fine grained level. If customers had fine grained control over their storage solutions, they would manage their costs more effectively and attain greater coverage of their desired storage solution space.
Clearly, customers need a more malleable storage solution where the customer adjusts the array parameters to more closely fit an application's exact needs as those needs are understood or change. Furthermore, the solution should offer customers the ability to adjust an existing solution without requiring replacement of the system or replicating the entire system. Therefore, an improved storage array should have the following characteristics:                The storage array should be topology independent allowing the array to change over time without concern for changes in the topology        The storage array should offer adjustable reliability, performance, capacity, cost per unit storage, or availability        The storage array should scale naturally at or below the disk level, lowering the atomic unit of a storage solution to the smallest identifiable granularity        The storage array's storage maps should offer fine grained control of data storage at or below the disk level without aggregation of atomic storage units into larger structures        The physical location of data within the array should be dynamic allowing data to migrate from one physical location to another in a manner transparent to operating systems, file systems, or applications        
A number of attempts have been made in the past to offer such a solution by combining various RAID levels. Unfortunately, all the attempts have failed to fully provide a cost-effective solution to customers while maintaining reliability, performance, or availability. All existing solutions suffer from scalability issues and have coarse grained storage maps at the system level.
Intel offers a Matrix RAID system where two disks are deployed within a server. The Matrix RAID offers a topology where each disk has one striped partition and one mirrored partition. The mirrored partition on a first disk mirrors the striped partition on a second disk. Through this topology the Matrix RAID system offers double the performance of a single disk system because data stripes across two disks and performs I/O operations in parallel, to within limits of the disk interface. In addition, data is reliable because the data is mirrored providing redundancy should one disk fail. The Matrix RAID is very similar to a RAID-10 system where the capacity of the system is one half of the total disk space; however, data is mirrored advantageously at a partition level rather than a disk level. Although the Matrix RAID system has a number of benefits from a reliability and performance perspective, it suffers from other limitations. The topology is fixed which means a customer cannot alter the array configuration once the customer deploys the system. The system does not scale because the Matrix RAID requires specific BIOS hardware and chipsets to realize the system and is further limited to two disks. Customers of the Matrix RAID are not able to fine tune the system to fit their exact needs after the system is deployed without great effort or cost.
InoStor Corporation's RAIDn system as outlined in a U.S. Pat. No. 6,557,123 follows a more traditional RAID route. Disks are combined together to create a storage array and the customer selects a desired reliability as defined by a number of disks in the array that can fail without the array suffering data loss. Data stripes across the disks in the array similar to a RAID-5 system along with multiple parity stripes. The number of parity stripes and their arrangement in the array is determined mathematically once the customer selects a desired reliability. InoStor's solution provides a blend of reliability and performance; however, the system suffers from scalability issues because specialized hardware is required to manage and calculate a complex parity. If a customer wishes to increase the capacity of the system, the customer must purchase an additional array. Consequently, InoStor's solution also suffers from the same limitations of a fixed topology as other RAID systems, namely the array cannot adjust easily once deployed.
Unisys Corporation's U.S. Pat. No. 6,785,788 outlines another attempt at offering a flexible storage array. Unisys forgoes parity in favor of mirroring just as the Intel Matrix RAID with the exception data stripes across disks of first capacity then the data mirrors across disks of a second capacity. This topology, also fixed, offers the advantages of performance and further offers customers the ability to purchase disks of disparate sizes thereby offering a more economical solution. However, because the data is still bound to complete disks, the system does not upgrade easily. In addition, the system does not scale naturally at the disk level.
Earlier prior art solutions fall short of offering a truly advantageous solution because they are bound to fixed topologies governed by expensive centralized hardware or complex software with coarse grain storage maps. A virtualized approach where data decouples from physical locations allows for the creation of arrays with flexible topologies governed by reconfigurable policies. Topologies based on nodes that map to logical partitions at or below the disk level rather than nodes that map to disks have the greatest flexibility. If data is decoupled from physical location, then data can move from one physical location to another transparently from the view of clients using the array. Furthermore, each client stores a different storage map thereby “seeing” a different array even though the physical storage system is shared among a number of clients. Topology independent arrays have reduced costs because each element in the system behaves independently eliminating the need for complex centralized governing systems and allows for expansion at the single disk level. Through an appropriate choice of a topological configuration, reliability of a storage array exceeds RAID-10, RAID-5, and even RAID-6 systems. Even though a topology independent array can employ RAID concepts including parity, employing redundancy for reliability offers greater performance at reduced cost because parity does not need to be maintained with specialized hardware. High performance is a natural result of a desired policy that incorporates data striping and scales as desired even after deployment by adding disks. Capacity also scales naturally at the disk level by adding disks to the array. Customers are always able to purchase disks that have the highest capacity-price (or performance-price) ratio. Data availability remains high because data can be mirrored for redundancy or data can move from an un-reliable location to a more reliable location in a manner that is transparent to applications. Customers also have the ability to trade one array parameter for another. For example, when establishing the policy for a topology independent storage array, by increase the reliability of an array via adding additional mirroring the available capacity of the array is reduced in response to the change assuming a fixed number of disks in the array.
Thus, there remains a considerable need for methods and apparatus that allow fine grained control of a storage array without requiring customers to spend a great deal of money to achieve their desired reliability, performance, capacity, scalability, or availability criteria.