1. Field of the Invention
This invention is related to the field of storage management and, more particularly, to software used in storage management.
2. Description of the Related Art
The Network File System (NFS) is a client/server application that lets a computer user view and optionally store and update files on a remote computer as though the files were on the user's own computer. The user's system needs to have an NFS client and the other computer needs an NFS server. NFS was developed by Sun Microsystems and has been designated a file server standard. The NFS protocol provides transparent remote access to shared files across networks. The NFS protocol is designed to be portable across different machines, operating systems, network architectures, and transport protocols. Implementations of NFS exist for a variety of machines, from personal computers to supercomputers.
NFS defines the way in which files are named and where they are placed logically for storage and retrieval. In NFS, a file is placed in a directory (folder in Windows) or subdirectory at the desired place in the tree structure. NFS also specifies conventions for naming files. These conventions may include one or more of, but are not limited to, the maximum number of characters in a name, which characters can be used, and, in some systems, how long the file name suffix can be. NFS also defines a format for specifying the path to a file through the structure of directories.
Using NFS, the user or a system administrator may mount a portion or all of the files available in the NFS file system. The mounted files may be accessed with whatever privileges are associated with the access to each file (e.g. read-only and read-write). The mounted files are logically organized in a file system. A file system is a tree on a single server with a specified root. NFS assumes a file system that is hierarchical, with directories as all but the bottom level of files. Each entry in a file system (file, directory, device, etc.) has a string name. Different operating systems may have restrictions on the depth of the tree or the names used, as well as using different syntax to represent the “pathname,” which is the concatenation of all the “components” (directory and file names) in the name.
A mount point is a position or node in a directory tree on a server at which a file system is mounted. Mount points on a server may be exported to other systems (e.g. servers). When a file system at a mount point on a server is exported to another system, the file system is mounted at a mount point in the importing system.
The NFS protocol uses file handles to uniquely identify files. An NFS server constructs a file handle using the file system identifier (fsid) and the file identifier (fileid) exported by the local file system. The local file system may guarantee that the file system identifier uniquely identifies a file system on that machine, and may guarantee that the file identifier uniquely identifies a file on the specified file system. Thus, the NFS server may guarantee that the file handle uniquely identifies a file on that server. In addition to the file system identifier and file identifier, the file handle may also include export information about the NFS server mount point. NFS supports a lookup procedure for converting file names into file handles.
A MOUNT protocol allows a server to hand out remote access privileges to a restricted set of clients. The mount protocol performs the operating system-specific functions that allow, for example, the attachment of remote directory trees to local file systems. The MOUNT protocol may be used to initiate client access to a server supporting the Network File System (NFS) application. The MOUNT protocol handles local operating system specifics such as path name format and user authentication. Clients desiring access to the NFS program may use the MOUNT protocol to obtain a file handle suitable for use with NFS.
Clustering may be defined as the use of multiple computers, for example PCs or UNIX workstations, multiple storage devices, and redundant interconnections, to form what appears to users as a single highly available system. Clustering may be used for load balancing, and parallel processing as well as for high availability. To the outside world, the cluster appears to be a single system. A cluster may be defined as a group of servers and other resources that act like a single system and enable high availability and, in some cases, load balancing and parallel processing.
A common use of clustering is to load balance traffic on high-traffic Web sites. A Web page request is sent to a “manager” server, which then determines which of several identical or similar Web servers to forward the request to for handling. Having a Web farm (as such a configuration is sometimes called) allows traffic to be handled more quickly.
The storage area network (SAN) model places storage on its own dedicated network, removing data storage from the main user network. This dedicated network most commonly uses Fibre Channel technology, a versatile, high-speed transport. The SAN includes one or more hosts that provide a point of interface with LAN users, as well as (in the case of large SANs) one or more fabric switches, SAN hubs and other devices to accommodate a large number of storage devices. The hardware (e.g. fabric switches, hubs, bridges, routers, cables, etc.) that connects workstations and servers to storage devices in a SAN is referred to as a “fabric.” The SAN fabric may enable server-to-storage device connectivity through Fibre Channel switching technology to a wide range of servers and storage devices.
The versatility of the SAN model enables organizations to perform tasks that were previously difficult to implement, such as LAN-free and server-free tape backup, storage leasing, and full-motion video services. SAN deployment promises numerous advantages, including cost management through storage consolidation, higher availability of data, better performance and seamless management of online and offline data. In addition, the LAN is relieved of the overhead of disk access and tape backup, data availability becomes less server-dependent, and downtime incurred by service and maintenance tasks affects more granular portions of the available storage system.
One of the primary goals of a file system is to reduce the latency associated with accessing data. Generally speaking, latency is the period of time that one component in a system spends waiting for another component. For example, in accessing data on a disk, latency includes the time it takes to position the proper sector under the read/write head. In networking, latency includes the amount of time it takes a packet to travel from source to destination. If one considers a human to be a component of a system, then latency measures the amount of time a human wastes waiting for a result. Users will increasingly obtain their desired product or service from sources providing the lowest latency. With that in mind, a file system often aims to reduce the time that people (as well as other elements of the system) spend waiting for data.
Traditional file systems are single-node file systems. In a single-node file system, a single-node storage stack (including, for example, volume management software) resides on each node and allows each respective node to manage data stored in storage devices accessible from that node. In a shared storage environment such as a SAN, multiple networked nodes may access a storage device managed by a particular node running a single-node file system. However, to access a file on the shared storage device, another node must typically use a network file protocol such as NFS. The node that manages the storage may be able to access the file directly, but the other node typically can access the file only indirectly. The indirection caused by the use of NFS may contribute undesirably to latency.
Enterprise Storage Management Application (ESMA) developers are usually expected to support a large variety of SAN storage stack components (both hardware and software) when writing portable, enterprise-class data management applications. However, dozens of file systems may each feature their own proprietary sets of metadata about their underlying data objects. File systems and volume managers may rearrange physical location of data at unpredictable times, thus rendering snapshot images inconsistent with the actual block locations. Each component of the storage stack may employ a unique method of caching/flushing data buffers, thus making it difficult to ensure synchronization among data objects. Logical volumes could be created across competing brands of volume managers or storage arrays. The arrays themselves may use different methods of creating snapshots and mirrors. For these reasons and more, it is daunting to expect a generic piece of software to communicate with heterogeneous APIs to determine physical locations of data.