This invention relates generally to the field of disk storage subsystems, and more particularly to redundant arrays of independent disks (RAID).
Modern, large-scale computer systems are usually configured with client and server computers connected via a network. The network can include local and wide area (Internet) components. The client computers, typically desk- or lap-top computers, provide a graphical user interface (GUI), a relatively small amount of local processing and storage, and user application programs. However, it is the server computers that provide the heavy duty processing, and bulk storage for files and databases. For data integrity purposes, the storage subsystems are usually in the form of a redundant array of independent disks (RAID).
A RAID subsystem protects against a disk drive malfunction. By using many disk drives, and storing redundant data along with user data, a disk drive failure will not cause a permanent loss of data. The manner in which the RAID subsystem provides data redundancy is called a RAID level. A number of RAID levels are known. RAID-1 includes sets of N data disks and N mirror disks for storing copies of the data disks. RAID-3 includes sets of N data disks and one parity disk. RAID-4 also includes sets of N+1 disks, however, data transfers are performed in multi-block operations. RAID-5 distributes parity data across all disks in each set of N+1 disks. At any level, it is desired to have RAID systems where an input/output (I/O) operation can be performed with minimal operating system intervention.
FIG. 1, in a very general way, shows a model of the interactions between an application program 101 and physical storage media 111 of a computer system, be it a client or a server computer. The application 101 makes non-redundant file I/O requests 102, or xe2x80x9ccalls,xe2x80x9d to a primary file system 104 to access non-redundant file I/O data 103. The application can be a foreground application, for example a word processor, or a background application, e.g., a file back-up system. Generally, the access requests 102 can be for data input (read) or data output (write) operations.
The primary file system 104 typically assumes the physical storage media is in the form of a block mode device 111. The block mode device can be single disk, multiple disks, or tapes, or other high capacity, relatively low latency, non-volatile memories. Therefore, the primary file system makes non-redundant block I/O requests 105 to a block server 107 of a prior art block mode RAID subsystem 100 to read or write non-redundant block I/O data 106. The RAID subsystem 100 uses a block mode interface 110 and makes redundant block I/O requests 108 to the disks 111 for redundant block I/O data 109.
Clearly, the primary function of the traditional block mode RAID subsystem 100 is to translate non-redundant block I/O requests and non-redundant block data into redundant block I/O requests and redundant block data. Storing at least two copies of each data block on at least two different physical devices provides this redundancy, so that should one device fail, the block can still be recovered. In some RAID levels, parity blocks provide the redundancy.
FIG. 2 shows interactions in a client-server type of arrangement of computers with a primary file system 104 configured to work over a network 204. Here, the file system 104 has a client side 201 and a server side 202. The network 204 transports data between the client side 201 and server side 202 of the file system 104. The application 101 directly calls 102 the client side 201 of the file system 104, and the server side 202 makes calls 105 to the traditional block mode RAID subsystem 100 of the server system 203.
In the arrangements shown in FIGS. 1 and 2, the RAID subsystem 100 is used to increase reliability of the system. However, the RAID subsystem 107 protects only against failures in the block mode device 111. Therefore, there are still many other points of failure in the system, each one represented by the components other than the disks used in these arrangements. To protect against failures by these other components, one must provide redundancy for the other components as well. Some examples of these components are memories, busses, controllers, and processors. The term storage area network (SAN) is typically used to describe this type of redundant arrangement.
FIG. 3 is an example of a SAN 300. Client computers 301-303 communicate with the SAN via the network 204. The SAN 300 appears as one large server computer to the client computers 301-303. The SAN 300 includes server computers 321-323, connected by a redundant bus 331 to shared RAID controllers 341-342, and the RAID controllers 341-342 are connected to a shared block mode device 361 via a shared bus 351 which may also be redundant. Thus, any component in the SAN 300 can fail without losing the ability to serve the client computers.
Large scale SANs are complicated and usually configured for specific mission-critical applications, for example, banking, stock markets, airline-reservation, military command and control, etc. In addition, elaborate schemes are often used to provide redundant block-mode data access via wide area networks (WANs) in case of major disasters. Therefore, SANs usually includes many proprietary components, including much one-of-a-kind software that performs system management. The low-volume, proprietary aspects of SANs makes them very expensive to build and operate.
Another approach to allowing redundancy across major components is to virtualize files at the file system level, and serve a set of files from that, see for example, U.S. Pat. No. 5,689,706 issued to Rao on Nov. 18, 1997 xe2x80x9cDistributed Systems;xe2x80x9d U.S. Pat. No. 6,163,856 issued to Dion on Dec. 19, 2000 xe2x80x9cMethod and Apparatus for File System Disaster Recovery;xe2x80x9d and U.S. Pat. No. 6,195,650 issued to Gaither on Feb. 27, 2001 xe2x80x9cMethod and Apparatus for Virtualizing File Access Operations and Other I/O Operations.xe2x80x9d
However, these prior art SAN systems still have the following problems. They require the use of a specific proprietary distributed file system. They do not allow the use of file systems that are standard to client processors. They cannot be used with databases or other applications that use a block mode device with no file system. Because of these limitations, systems based on those implementations may never provide the features in widely used file systems, and may be limited to a few expensive operating systems.
Therefore, there still is a need for a system and method that provides data redundancy using standard components, interfaces and networks, and provides block mode access for maximum flexibility of application usage.
The present invention provides data redundancy at the file level, instead of at the block level as in the prior art. The redundancy is provided in a file mode form, rather than a block mode form as in the prior art. Therefore, file data can be located on any system or server, including a local system, or a server on a local area network, or a remote server on a wide area network. Because files are easily shared over networks through standard high volume, low cost hardware, software, and protocols, the file mode redundancy based on files has a level of data redundancy that is as high or higher than a traditional SAN, with more flexibility than a distributed file system. Using the invention, most costs remain consistent with high volume commodity components.
Depending on where files are stored, high performance and reliability can be achieved through disks on the local system that include file systems, and extremely high reliability can be achieved by using disks on network servers that have file systems. With the invention, disaster recovery is trivial to implement because files can be shared over a WAN, using well-known protocols, among any system which uses any operating system for sharing files.
The invention enables application programs to use block mode devices located anywhere for databases or specific file systems. The resulting devices, in combination with a file system, can then be shared out over the network so other application programs can use the devices, enabling a SAN that uses only a file system for connectivity.
More particularly, a method accesses data with a redundant array of independent disk (RAID) subsystem by having an application generate non-redundant file I/O requests for a primary file system. In the RAID subsystem, non-redundant block I/O requests corresponding to the non-redundant file requests received from the primary file system are generated. The non-redundant block I/O requests are then translated into redundant file I/O requests for redundant file I/O data maintained by the RAID subsystem, and in a secondary file system, the redundant file I/O requests are translated into non-redundant block I/O requests for a block mode device.