xc2xa71.1 Field of the Invention
The present invention concerns computer storage and file systems. More specifically, the present invention concerns techniques for managing and using a distributed storage system.
xc2xa71.2 Related Art
Data generated by, and for use by, computers is stored in file systems. The design of file systems has evolved in the last two decades, basically from a server-centric model (which can be thought of as a local file system), to a storage-centric model (which can be thought of as a networked file system). Stand alone personal computers exemplify a server-centric modelxe2x80x94storage has resided on the personal computer itself, initially using hard disk storage, and more recently, optical storage. As local area networks (xe2x80x9cLANsxe2x80x9d) became popular, networked computers could store and share data on a so-called file server on the LAN. Storage associated with a given file server is commonly referred to as server attached storage (xe2x80x9cSASxe2x80x9d). Storage could be increased by adding disk space to a file server. Unfortunately, however, SASs are only expandable internallyxe2x80x94there is no transparent data sharing between file servers. Further, with SASs, throughput is limited by the speed of the fixed number of busses internal to the file server. Accordingly, SASs also exemplify a server-centric model.
As networks became more common, and as network speed and reliability increased, network attached storage (xe2x80x9cNASxe2x80x9d) has become popular. NASs are easy to install and each NAS, individually, is relatively easy to maintain. In a NAS, a file system on the server is accessible from a client via a network file system protocol like NFS or CIFS. Network file systems like NFS and CIFS are layered protocols that allow a client to request a particular file from a pre-designated server. The client""s operating system translates a file access request to the NFS or DFS format and forwards it to the server. The server processes the request and in turn translates it to a local file system call that accesses the information on magnetic disks or other storage media. The disadvantage of this technology is that a file system cannot expand beyond the limits of single NAS machine. Consequently, administering and maintaining more than a few NAS units, and consequently more than a few file systems, becomes difficult. Thus, in this regard, NASs can be thought of as a server-centric file system model.
Storage area networks (SANs) (and clustered file systems) exemplify a storage-centric file system model. SANs provide a simple technology for managing a cluster or group of disk-storage units, effectively pooling such units. SANs use a front-end system, which can be a NAS or a traditional server. SANs are (i) easy to expand, (ii) permit centralized management and administration of the pool of disk storage units, and (iii) allow the pool of disk storage units to be shared among a set of front-end server system. Moreover, SANs enable various data protection/availability functions such as multi-unit mirroring with failover for example. Unfortunately, however, SANs are expensive. Although they permit space to be shared among front-end server systems, they don""t permit multiple SANs environments to use the same file system. Thus, although SANs pool storage, they basically behave as a server-centric file system. That is, like a fancy (e.g., with advanced data protection and availability functions) disk drive on a system. Finally, various incompatible versions of SANs have emerged.
The article, T. E. Anderson et al., xe2x80x9cServerless Network File Systems,xe2x80x9d Proc. 15th ACM Symposium on Operating System Principles, pp. 109-126 (1995) (hereafter referred to as xe2x80x9cthe Berkeley paperxe2x80x9d) discusses a data-centric distributed file system. In the system, manager maps, which map a file to a manager for controlling the file, are globally managed and maintained. Unfortunately, the present inventors believe that maintaining and storing a map having every file could limit scalability of the system as the number of files become large.
xc2xa71.3 Unmet Needs
In view of the foregoing disadvantages of known storage technologies, such as the server-centric and storage-centric models described above, there is a need for a new storage technology that (i) permits storage capacity to be added easily (as is the case with NASs), (ii) that permits file systems to be expanded beyond a given unit (as is the case with SANs), (iii) that are easy to administer and manage, (iv) that permit data sharing, (v) and are able to perform effectively with very large storage capacity and client loads.
The present invention may provide methods, apparatus and data structures for providing a file system which meets the needs listed in xc2xa71.3. A distributed file system in which files are distributed across more than one file server and in which each file server has physical storage media may be provided. The present invention can determine a particular file server to which a file system call pertains by (a) accepting a file system call including a file identifier, (b) determining a contiguous unit of the physical storage media of the file servers of the distributed file system based on the file identifier, (c) determining the file server having the physical storage media that contains the determined contiguous unit, and (d) forwarding a request, based on the file system call accepted, to the file server determined to have the physical storage media that contains the determined contiguous unit.
The file identifier may be an Inode number and the contiguous unit may be a segment. The file server having the physical storage media that contains the determined contiguous unit may be determined by a table, administered globally across the file system, that maps the contiguous unit to (the (e.g., IP) address of) the file server.