Internet services, such as email, web browsing, gaming, file transfer, and so on, are generally provided using a client-server model of communication. According to the client-server model, a server computer provides Internet services to other computers, called clients. Familiar examples of servers include mail servers and web servers. A server communicates with the client computer to send data and perform actions at the client's request. A computer may be both a client and a server. For example, a web server may contact another computer to synchronize its clock. In this case, the computer providing the clock data is a time server, and the requesting computer is both a time client and a web server.
Conventionally, a service provider, such as a web site, is responsible for creating and making available content for people to consume. Web sites typically following this model include, for example: news sites like CNN.com or BBC.co.uk; sites offering retail sales like Amazon.com or BestBuy.com; search engines with indexed search data like Google.com or MSN.com; and so on. However, a usage model is emerging whereby the users of a service, rather than the service provider, produce content for others to consume. In this “Web 2.0” model, a service provider operates a content creation server, and invites users to create or upload content to be hosted there. Examples of this model include blog providers such as Blogger.com; news aggregators like Digg.com and Reddit.com; and video sharing sites such as YouTube.com. Some websites are a hybrid between the two, in that the website management provides subject matter for users to comment on. An example of a hybrid site is technology news discussion site Slashdot.org, where staff selects news stories from other sites for comment. Traditional websites that originate content seem to be migrating towards becoming such hybrids. News site MSNBC.com may allow readers to comment on posted news stories, for example.
The infrastructure behind the Internet is growing to adapt to these changes from the traditional client-server model. A traditional service provider may be a business, and as such have a limited staff that can create and publish only a relatively small amount of content in any given timeframe. With user-generated content, however, the amount of data that can be created over the same timeframe increases by several orders of magnitude. Thus, a server infrastructure may suffer from problems of scalability, as the volume of data that must be processed and stored grows exponentially. Simply buying larger data storage devices can be prohibitively expensive, as technological limitations typically cause the cost-to-capacity ratio of storage devices to increase as capacity increases. Service providers may instead look for more cost-effective ways to store their data, including purchasing larger numbers of devices with smaller storage capacity. Clusters of such smaller devices are known in the art. For example, techniques have been developed to control redundant arrays of inexpensive disks (RAID). Furthermore, service providers may require a storage solution that integrates tightly with their existing computer infrastructure, rather than a system purchased off-the-shelf. Service providers may also need the ability to deal with data storage interruptions. RAID systems may provide these benefits; however, service providers may require that a storage system be cost-effective to support and maintain. RAID systems tend to be expensive, complex, and require considerable expertise and patience to manage.
Storage systems arrange their data in a filesystem. A filesystem is a system for storing and organizing computer files in a storage system to make it easy to find and access the files. A file, in turn, is a collection of data. FIG. 1 depicts a filesystem directory tree as known in the prior art, for example, as in the UNIX® model (Unix). Files within a filesystem are organized into directories. As with almost everything else in Unix, a directory is a type of file; in this case, one that contains information about other files. As a directory may refer to both (data) files and other directories, directories may nest one within the other. As a result, a filesystem has a tree-like structure, where each directory acts as a branch. Continuing the analogy, a regular data file is sometimes known as a leaf. Like a tree, each filesystem has a root—a root directory 110. The root directory 110 depicted in FIG. 1 contains two directories 120 and 122 (branches), and a file 124 (a leaf). Directory 120 has two files 130 and 132, while directory 122 has three files and a subdirectory 140.
All files in a filesystem may be accessed by specifying a path from the root directory 110 to the file. For example, the location in the filesystem of file 150 is uniquely determined by the path from root directory 110 to directory 122 to directory 140 to file 150. A path is ordinarily represented by a concatenation of the names of the intermediate files, separated by a special character. This written description follows the Unix convention of a forward-slash / as a path separator, although alternate operating systems such as Microsoft® Windows® may use a different path separator. The root directory 110 has the special name /. Thus, if the directories are named as they are labeled in FIG. 1, file 150 has the path /122/140/150. (The Windows equivalent is C:\122\140\150, where C:\ is the name of the root directory.)
FIG. 2 is a block diagram of various operations that may be performed on files located within a filesystem directory tree. There are four major types of operations performed on files: file creation, reading data, updating data, and file deletion. Together these are known as CRUD operations, and provide the core functionality required of any storage system. Operating system architects support these main operations with additional operations. For example, it may be inconvenient for a software developer to continually refer to the full path of a file for each file operation. Thus, an operating system may provide the ability to open a file (that is, to initialize certain data pertaining to the file, including the file path) before performing any of the four major operations. Similarly, an operating system may provide the ability to close the file, to free up system resources when access is no longer required. All of these CRUD and support operations define the capabilities of the filesystem. POSIX®, which is the Portable Operating System Interface, an industry standard (IEEE 1003; ISO/IEC 9945), defines these operations as well.
Different filesystem designers may wish to implement different filesystem capabilities. For example, some filesystems support very large files. Some filesystems support a log of file operations, which can be “replayed” to ensure data consistency in case of a system failure. Some filesystems store data to a network, rather than a hard drive in the local computer. Examples of filesystems with different capabilities include the Windows NT® filesystem NTFS, the Common Internet File System CIFS, the Unix File System UFS2, Sun Microsystems® ZFS and Network File System NFS, Linux filesystems EXT3 and ReiserFS, and many others. Each of these filesystems implements the various filesystem CRUD and support operations. Thus, an NTFS filesystem 210 implements an open function 212 for opening a file, a close function 214 for closing an open file, a read function 216 for reading data from an open file, a write function 218 for writing data to an open file, and others. Similarly, a CIFS filesystem 230 implements an open function 232, a close function 234, a read function 236, and a write function 238. However, these filesystems differ in that NTFS filesystem 210 contains operations that access a local hard disk drive 220, while CIFS filesystem 230 contains operations that access a network 240, such as a local area network (LAN). In a CIFS filesystem 230, network 240 is connected to a file server 250 which may have a hard disk drive 260 that actually stores the file data. CIFS filesystem 230 creates network messages that contain instructions, such as “read one kilobyte of data from file F”, and sends them to file server 250. File server 250 receives the network messages, and translates them into requests on its own filesystem, which may access hard disk drive 260. Once the requests have completed, file server 250 creates a response network message and sends it back to CIFS filesystem 230 using network 240. However, a software application running on a computer supporting CIFS may simply use read function 236 without concerning itself with the details of the underlying network communication. Filesystems other than NTFS and CIFS similarly differ in their implementations, but all POSIX-compliant filesystems provide at least the same minimum filesystem CRUD and support operations.
A computer may support several different filesystems simultaneously. However, this capability raises a problem. Users require a unified method to address files, regardless of the filesystem in which they are stored. The exemplary method to address files is to use a file path, as described above. However, there must be a way to distinguish between the two different root directories of the two filesystems—they cannot both be named /. A common solution to this problem is to attach one filesystem tree to the other, in a process known as mounting. The reverse process of detaching two filesystem trees is known as unmounting, or dismounting.
FIG. 3 shows the relationship between two filesystem directory trees involved in a filesystem mount operation. In a mount operation, one of the filesystems acts as the root of the tree, as before, and is called the root filesystem. Typically, the root filesystem will be one that accesses a local hard disk drive. In the example of FIG. 3, the root filesystem 310 is an NTFS filesystem 210, with associated NTFS filesystem operations that access local hard disk drive 382. The other filesystem is known as the mounted filesystem. Here, the mounted filesystem 340 is a CIFS filesystem 230, with associated CIFS filesystem operations.
As before, root filesystem 310 has several files in it: directory A 330, directory B 332, directory C 334, and so on to directory Z 336. These directories have subdirectories and contain files, as shown. One of these directories, say 336, is chosen by the filesystem user as a point of attachment (also known as a mount point). A user then mounts filesystem 340 onto this directory using an operating system command, such as the Unix mount command. Before mounting, directory path/Z refers to directory 336. After mounting, mounted directory 350 replaces directory 336 in the filesystem tree, so directory path/Z now refers to directory 350, not directory 336. Any files contained in directory 336, such as file 338, are now inaccessible, as there is no way to address them with a path. For this reason, mount points are usually chosen to be empty directories, and may be specially created for that purpose. A typical Unix example is the directory /mnt. A filesystem may simultaneously mount several filesystems. Thus, /mnt may be empty, or it may contain several empty subdirectories for use as mount points if multiple filesystems are to be mounted therein.
As an example, before the filesystem 340 is mounted, directory Z 336 is empty. After mounting, the directory/Z now contains two subdirectories, /Z/D1 and /Z/D2. Path /Z/D1 represents a path containing the root directory 320, the mount point /Z (which refers to the root directory 350 of the second filesystem), and the directory 360. As another example, files 370 and 372 are available after mounting using paths /Z/D2/F1 and /Z/D2/F2 respectively (passing through directory D2 362). When a user is finished, the umount command is available to detach the two filesystems. Once the second filesystem is unmounted, files such as file 338 are accessible to the operating system again.
Which file operations apply to a given file depends on which filesystem the file is located in. This is determined, in turn, by the path of the file. For example, file 331 has path /A/F2, which is located in an NTFS filesystem. Thus, NTFS operations are used on the file. These operations access a person's local hard disk drive 382, according to the design of NTFS. However, file 372 has path /Z/D2/F2, which crosses the mount point /Z. Thus, CIFS file operations are used on the file. These operations send a CIFS message through LAN 392 to another computer 394. Computer 394 supports CIFS, and contains the root directory 350 of filesystem 340. Computer 394 receives the request, which it then applies to filesystem 340. The process then begins again on computer 394. The path of the file on computer 394 is /D2/F2, which may be seen from looking now only at filesystem 340. Computer 394 determines the proper file operation to execute based on this path, itself looking for mount points. Computer 394 may pass along the operation to its local hard disk drive 396, or even to another device using another filesystem type if /D2 is a mount point in filesystem 340. Thus, the operating system of computer 394 provides a further level of abstraction.
Filesystem mounting can be used to increase the amount of file storage space available to a web server. Thus, mounting may be used to alleviate a service provider's needs in this respect. There are generally three paradigms for expanding storage space: adding additional local hard drives, mounting a network-attached storage (NAS), and mounting a storage area network (SAN). A NAS is one or more hardware devices used solely for storage (and not for any other applications), accessible over a network, which may be mounted on a computer using a standard network filesystem such as CIFS or NFS. Under a NAS, a computer will recognize the remote nature of the file, and convert file operations into formatted network messages. A SAN is similar, except that the remote devices are mounted using a proprietary filesystem, such that the core operating system is unaware that the file data are stored remotely.
The first paradigm, adding additional local hard drives, does not scale very well. Modern computers only have a finite number of connections to which to attach additional devices. Thus, this paradigm is not generally used for very large business operations.
The second paradigm requires mounting a NAS. A NAS scales well hardware-wise, as any number of devices may form the NAS, and they may be added easily to an existing setup. (Several versions of Microsoft Windows limit the number of mounted filesystems. Unix systems generally do not have this limitation.) A NAS is also generally less expensive than a SAN, byte-for-byte. However, because CIFS and NFS access a remote computer for each file operation, they have performance penalties. The process of traversing a file path, for example, requires locating a directory, reading its contents, locating the next directory, reading its contents, and so on until the final file is located. In NFS, each of these operations is a network access. On large networks nearing bandwidth saturation, NFS request/response pairs may be delayed enough to cause user frustration. In addition, NFS does not react well to failure conditions. For example, if a server hosting an NFS filesystem becomes unresponsive for any reason, a client that has mounted the filesystem may wait for a considerable period of time to complete an NFS transaction. In some NFS implementations, this delay may spread to other parts of the operating system, causing the client computer to also become unresponsive. As a result, NFS network administrators may be very particular about the order in which computers are restarted or failure conditions addressed.
The third paradigm requires mounting a SAN. A SAN is a proprietary product that can take several different storage devices and pool them, so that a computer sees them as a single, large, local storage unit. Thus, a SAN does not have to rely on off-the-shelf protocols such as CIFS or NFS. For this reason, SAN providers may offer better support for their products than NAS providers, including services to better integrate their product into an existing network infrastructure. A SAN is generally more expensive than a NAS. Each SAN has its own method for dealing with data storage interruptions, and different vendors offer different guarantees and service-level agreements. Of course, using a SAN generally implies the presence of an “intermediary” in the form of a device that adapts the “block” view of the world the SAN provides to the application view (e.g., in the form of software running on one or more clients of the SAN that may coordinate access among clients and implement abstractions such as files, or others, for example mail repositories, DBMSes and so on). Thus a direct comparison between a SAN and NAS devices can be misleading as the two have inherently different capabilities.