1. Field of the Invention
This invention relates generally to computer network data and, more particularly, to processing computer network requests for files and for data storage.
2. Description of the Related Art
Network-connected computers provide an efficient means of sharing information. A convenient way of storing information for computer access is to store the information in electronic form in a data storage unit that is attached to the computer network. If the stored data can be accessed by any of the computers that are connected to the network, it is easier for persons to share the data they have created, thereby making collaboration easier.
The Internet is currently the largest and most widely used computer network. Information and data are transported across the Internet according to the Transport Control Protocol/Internet Protocol (TCP/IP). Many users request documents comprising pages in hypertext mark-up language (HTML) format from among the millions of pages of data that are stored at Internet-connected file servers. As used herein, a server refers to a network computer that services and manages requests for documents or data files from network-connected computers utilizing wired and wireless communication links. Other types of Internet servers may control access to particular types of data files, and may comprise multimedia file (including video and audio) servers, mail servers, and e-commerce servers. Some servers route such requests to an appropriate computer within a cluster of computers, each of which performs one of these server functions. Such clusters of computers are generally referred to as server farms. The Internet, however, is only one type of network for which large capacity data storage solutions have become critically important.
Many companies, universities, and other organizations run their own internal computer networks (commonly called intranets) and share data files using TCP/IP and other data transport protocols. Computer users in such organizations may share access to computer files, such as documents, by sending a request to a network resource, also referred to as pointing their computer to a particular network-connected resource, which those skilled in the art will recognize as network drives. As computer networks have proliferated, the need for network data storage has increased at a rapid rate.
Data storage solutions typically have been relatively large-capacity disk drive units attached to a file server computer. When the need for data storage became too large to make storage at a disk drive unit practical, other storage solutions were developed. It is now common to provide large capacity network storage solutions in the form of a RAID (Redundant Array of Independent Disks) array. A RAID array typically integrates several disk drive data storage units. It is not unusual for a RAID array to have a storage capacity of hundreds of gigabytes. A data management controller of the RAID array handles processing for storage and retrieval of data files between the RAID array and the network.
When a network client accesses the RAID array, the data path to the array of disks appears to the client as a single network resource or drive letter. The RAID data management controller will typically implement a data redundancy strategy to ensure adequate data recovery in the event of a failure of a disk drive unit of the RAID array. Regardless of the storage capacity of any one disk drive making up the RAID array, the capacity of the RAID array may be specified as a number of data volumes. The RAID controller will automatically map the data volumes to the actual physical locations among the disk drive units of the array. Typically, the RAID controller will automatically split a single data file so that it is stored among the RAID disks, storing records of the file in virtual data volumes spread across the multiple disks, with parity checking for a redundancy strategy. Thus, an Internet file server might receive requests from network clients for Web pages, then can retrieve one or more files comprising a Web page by accessing a RAID array through a data path as a single network resource. The retrieved data files from the RAID array can then be served to the network clients.
More recently, network data storage requirements have evolved to the stage where a data storage solution called network attached storage (NAS) has been developed. The NAS solution is a high-capacity network data storage system. Thus, the NAS appears to network nodes (network-connected client computers) as a single disk drive resource, and client computers can access and manipulate files at the NAS with local file system commands. An NAS system, however, may include multiple file servers, each of which may comprise a RAID array. To a client computer at a network node accessing an NAS system, the system appears as a single resource, so that data files stored at the NAS system can be accessed by specifying a data path having a single network drive letter. This is true even though the actual data file may be physically located on any one of the drives that make up the system, split across one or more disks of a RAID array, or split across one or more RAID arrays of the NAS. A data management controller of the NAS handles receiving requests for data files and keeping track of the actual drive(s) on which any one data file is located. An NAS system may have a capacity of more than one terabyte (one thousand gigabytes). It should be apparent that the management tasks of an NAS system for data management can be much more complex than is the case with a typical RAID array.
Typically, an NAS system sends and receives data with network computers using a network communications protocol, such as TCP/IP. Thus, the data storage communications are merged with the network communications. The NAS controller interfaces to the machines of the network (i.e., the client and server computers) using a network communications service such that a single network drive letter can be mapped to the file servers. Two services in wide use are the Network File System (NFS) for Unix-compatible machines, and the Common Internet File System (CIFS) for Microsoft Corporation Windows-compatible machines. Thus, network clients can directly access data files stored at the NAS system through a simple data path.
FIG. 1 is an illustration of a typical conventional NAS system 100 in which a data storage computer 102 called a “filer” or “NAS unit” includes multiple disk drive storage units 104 and communicates over a network 106 with multiple computer clients or application servers 108, 110, 112 to exchange data files. The network data transfer between the machines 102, 108, 110, 112 can occur using a data protocol such as, for example, TCP/IP. In this way, the NAS unit 102 interposes a network 106 between itself and the networked data application servers and clients 108, 110, 112.
Each one of the network nodes 108, 110, 112 can map the NAS unit 102 to their respective local file system so that the NAS unit becomes one of their accessible network resources. Thus, the NAS unit 102 is identified to each of the respective machines 108, 110, 112 as a storage volume, typically identified as a drive letter or volume name. Those skilled in the art will understand that a data file at the NAS unit therefore can be accessed from one of the nodes 108, 110, 112 by reference to a data file pathname that specifies the volume of the NAS unit, such as in the form of a drive letter (e.g., “d:\filename”). While the single volume of the NAS unit might include multiple storage drive units 104, when there are multiple NAS units for access by the network clients and servers, there will typically be multiple volumes, and multiple volumes need to be accessed using multiple drive letters. In other words, each NAS unit is mapped to a volume, and therefore a volume is limited to the storage size of one physical NAS unit.
Unfortunately, the need to reconfigure NAS units, such as to move data between NAS units, is time-consuming and disruptive to the system environment, particularly to the clients that need to access data. It is not unusual for more than half of the storage management cost to be spent in the maintenance of multiple NAS units, expended on tasks such as the movement of data files between the NAS units. Data may need to be moved between NAS units in a variety of circumstances, such as where an NAS unit has no more data storage space, where a new NAS unit is added to the system, or where data is simply being re-balanced among existing NAS units. During the data move process, one or more of the NAS units involved in the move will be unreachable to the clients and servers 108, 110, 112. In addition, if one NAS unit is added or deleted, the network clients must map a new volume to the new NAS unit. Consequently, there is a great inconvenience to network clients when data is moved or distributed among multiple network storage devices, as such data movement requires downtime on the part of the clients and the network storage devices. Because of these limitations, the NAS storage solution is not easily scalable beyond a single NAS unit. Many data storage sites have such great storage demands that they include multiple NAS units.
From the discussion above, it should be apparent that there is a need for a system that provides a more scalable and manageable data way of reconfiguring network storage devices that permits data to be moved and distributed among multiple network storage devices in a manner that is transparent to client machines on the network.