The present invention relates generally to computer data storage. More specifically, the present invention relates to a high-availability data storage methodology for a computer network.
RAID (Redundant Array of Inexpensive Disks) technology, which uses multiple disk drives attached to a host computer, is a way of making a data store highly available. The RAID controller or host software makes a redundant copy of the data, either by duplicating the writes (RAID 1), establishing a parity disk (RAID 3), or establishing a parity disk with striped writes (RAID 5). Greater levels of redundancy can be achieved by increasing the number of redundant copies.
Although RAID technology provides a highly available disk array, data availability is not guaranteed. For instance, if the host computer fails, data becomes unavailable regardless of how many redundant disk arrays are used. In order to provide an even higher level of data availability, dual-ported arrays, which are accessible by two host computers, are used. The two host computers establish a protocol between them so that only one writes to a given disk segment at a time. If one host computer fails, the surviving host computer can take over the work of the failed computer. This type of configuration is typical in network file servers or data base servers.
A disadvantage of dual-ported disk arrays, however, is that they use a number of rather expensive components. Dual-ported RAID controllers are expensive. Moreover, a complex protocol is used by the hosts to determine which is allowed to write to each disk and when they are allowed to do so. Often, host manufacturers charge a substantial premium for clustering software.
Beside the high costs of system components, another disadvantage of dual-ported disk array systems is that the number of host computers that can simultaneously access the disk array is restricted. In dual-ported disk array systems, data must be accessed via one or the other host computer. Thus, the number of data access requests that can be serviced by a disk array system is limited by the processing capability of the host computers.
Yet another disadvantage with multi-ported disk arrays is that expanding the storage requires upgrading the disk array (for storage) or host computer (for processing). There are RAID arrays that expand by adding disks on carrier racks. However, once a carrier rack is full, the only way to expand the array is to get a new, larger one. The same situation holds for the host computer. Some host computers, such as Sun 6500, from Sun Microsystems of Mountain View, Calif., may be expanded by adding more processors and network interfaces. However, once the computer is full of expansion cards, one needs to buy a new computer to expand.
An embodiment of the present invention is a distributed and highly available data storage system. In one embodiment, the distributed data storage system includes a network of data storage units that are controlled by an object management system. Significantly, whenever data is written to one data storage unit, the object management system makes a redundant copy of that data in another data storage unit. The object management system preferentially selects the distributed data storage units for performing the file access requests according to the external inputs/outputs with which the file access requests are associated. In response to a file creation request that is associated with an external input of one distributed data storage unit, the object management system will preferentially create a data file in that distributed data storage unit. In response to a file retrieval request that is associated with a data file and an external output of another distributed data storage unit, the object management system will preferentially return a hostname and pathname of a copy of the data file that is stored within that distributed data storage unit. The object management system also makes redundant copies of the data files in different units to provide high availability of data.
An aspect of the present invention is that it is not necessary to use expensive RAID servers. Rather, inexpensive, commodity disk servers can be used. The distributed and highly available data storage system is also highly scalable, as additional disk servers can be added according to the storage requirement of the network.
Another aspect of this invention is that dedicated servers for the disk service functions are not required. Disk service functions can be integrated into each data storage unit. The data storage units may be implemented using relatively low cost, general-purpose computers, such as so-called desktop computers or personal computers (PCs). This aspect is of importance to applications where I/O, CPU, and storage resources follow a proportional relationship.
Yet another aspect of the present invention is that users of the system are not tied to any specific one of the data storage units. Thus, individual users may exceed the storage capacity of any given data storage unit. Yet another aspect of the present invention is that an expensive TDM (Time Domain Multiplexed) switching infrastructure is not required. An inexpensive high-speed Ethernet network is sufficient to provide for the necessary interconnection. Yet another aspect of the present invention is that the data storage system is scalable to the number of its external I/Os.