1. Field of the Invention
The present invention relates to controlling storage, modification, and transfer of data in a network providing distributed data storage services. In particular, the present invention relates to creation and management of distributed file systems, and network-based file systems, that resolve a namespace (i.e., a fully-qualified path) of a file system object to a stored location of that file system object.
2. Description of the Related Art
A file system is a method for storing and organizing file system objects in a manner that ensures the file system objects are readily accessible. In other words, a file system is a set of abstract data types, referred to herein generally as file system objects, that are implemented for the storage, hierarchical organization, manipulation, navigation, access, and retrieval of data.
A fundamental aspect of a file system is that it maps namespaces to storage locations. Indices overlying the namespace also are used to organize the file system objects into a hierarchical organization referred to as directories. Consequently, a file system will establish a namespace (i.e., a fully qualified path to a file system object) relative to a root directory that is deemed the top-level position (i.e., the origin) within the hierarchical organization, such that all namespaces are structured relative to the root directory. For example, in the case of a Microsoft Windows-based operating system, the root directory is identified by the characters “\\”. In addition, any location within the directory structure can be deemed a new root (“subroot”), for example in the case of organizing file system objects sharing the same subclass attribute.
A fundamental attribute of prior art file systems is that a fully qualified path for a given directory will map to a root on a given physical device, and all fully qualified paths for data objects within that directory will map to the same physical device. For example, the fully qualified name “\\cdrive\foo\bar” maps to a different physical device than “\\\ddrive\foo\bar”; however, the fully qualified name “\\cdrive\foo\file 1” and the fully qualified name “\\cdrive\foo\bar” both map to the same physical device because both fully qualified names share the same physical device root of “\\cdrive”.
In the Unix file system, a “hard link” may be used to create additional links to a directory, as described below, where a hard link references the same Mode in the Unix file system. Hard links for a given file can be placed in different directories, however all the hard links for that given file must remain on the same physical device, and within the same physical partition if a disk has multiple partitions. In addition, the Unix/Linux “mount” command places the hierarchy of some device at an arbitrary location in the global namespace; however, the tree under this arbitrary location still is restricted to the same device, such that files cannot moved to different locations without updating the mount point.
FIG. 1 is a diagram illustrating a structure of a conventional prior art directory 10. The directory 10 includes a root directory 12 (“\\”), and subdirectories 14a, 14b, 14c, 14d, 14e, 14f. File system objects include three types: collection objects, data objects, and redirect objects. In almost all file systems a collection object is effectively a data object; for example, the Unix or Linux command “opendir” enables a user to observe the contents of a directory (e.g., 14a) as a file.
FIG. 2 is a diagram illustrating a file system used for storage of files on a hard disk 28. FIG. 2 also illustrates a prior art collection object 18. The collection object 18 is illustrated in FIG. 2 as a directory table have a plurality of directory entries 20. Each directory entry 20 includes a name field 22, an attributes field 24, and a location (Loc) field 26 for a corresponding file in the file system. The name field 22 specifies an alphanumeric string assigned as a file name to the corresponding file. The attributes field 24 may store attributes relevant to the corresponding file (e.g., size, read/write permission, time of creation, last modified time, etc.). As illustrated with respect to the directory entry 20a, the Loc field 26 specifies the location (i.e., physical address) 36 on the disk 28 of the first data block (e.g., 512 bytes) (e.g., 30a) that stores the beginning of the corresponding data file having the name specified in the corresponding name field 22 (e.g., “foo”). In addition, the collection object 18 itself is stored on that same physical disk 28. The term “inode” has been used to describe the numeric address (e.g., “550”) 36 on the disk 28 where the corresponding file (e.g., 30a) is stored. As used herein, however, the term “inode” also refers to the metadata that is attached to the file; hence, the inode is considered part of the file, but not part of the data that is in the file.
As illustrated in FIG. 2, the disk 28 also includes a File Allocation Table 32. The File Allocation Table 32, used in both Unix and Windows-based file systems, has a table entry 34 for each and every data block 30 on the disk 28 (Windows-based file systems refer to the table 32 as a “FAT”, e.g., FAT-16 or FAT-32). Each of the entries 34 may be implemented either as a single bit indicating whether the block is allocated, or as a linked list as illustrated in FIG. 2, where a given entry (e.g., 34a at location 550) for a given data block (e.g., 30a) is referenced by the corresponding address 36 of the corresponding data block 30a, and will specify whether another entry (e.g., 34b at location 551) exists for the associated data object.
The file system has a directory entry 20a for the file “foo” having a location field 26 that specifies a corresponding location (“550”) 36 of the first data block 30a. The File Allocation Table 32 has entries 34a, 34b that point to the successive data blocks 30b and 30c at respective locations “551” and “16” 36, enabling the file system to access the successive data blocks 30b and 30c and their respective table entries 34b, 34c. Hence, a file (e.g., having filename “foo”) that utilizes three (3) disk blocks can be stored at disk blocks 30a (at location “550”), 30b (at location “551”), and 30c (at location “16”), where File Allocation Table entries 34a and 34b specify the successive next block locations 36, and the last entry 34c has a null pointer indicating an end of file entry (e.g., “0”).
As illustrated in FIG. 2, the directory table 18 could include another directory entry 20b that points to the same location (e.g., 550) 36 on the disk. Hence, both the directory entry 22a having the name “foo” and the directory entry 22b having the name “bar” point to the same location 36 of entry 34a (and the corresponding data block 30a), even though the entries 22a and 22b have different attributes 24 (e.g., “attr1” and “attr2”), where the entry 20b specifies in the corresponding attribute field 24 that the file “bar” is a file (“F”). Although entries 22a and 22b pointing to the same location (“550”) 36 is a valid example in the Unix file system of a hard link, these entries 22a and 22b referencing the same location (“550”) 36 in a Windows-based file system is referred to a “cross-linked file”, and is considered illegal in the Windows-based file system. Also note that the entry 20c (referencing the location 36 of entry 34b in the middle of the linked list formed by locations 34a, 34b, and 34c) is deemed illegal by all conventional file systems.
As apparent from the foregoing, the attributes field 24 and the associated directory entry (e.g., 20a) are stored separately from the referenced file (e.g., composed of the information at blocks 34a, 34b, and 34c). Consequently, since the location field 26 points to a location 36 on the hard disk 28, all the entries 20 of the directory 18 must reside on the same physical device 28. In other words, the location value “550” in the Loc field 26 of directory entry 20a would have no relevance on another disk because it may point to the middle of a linked list, described above as illegal in all existing file systems.
As described above, the directory table 18 is stored as a data object on the disk 28. Hence, the directory table 18 may include an entry 20d specifying in the corresponding attributes field 24 that the data object 30d having the name “Dir1” in the name field 22 has an attribute of being a directory (“D”), and a location field 26 specifying the location (“602”) 36 of the data object 30d storing the directory contents.
Hence, all data elements within a collection must exist on the same physical device.
Redirect objects are references to target destinations. Redirect objects have been implemented using one of two methods: (1) named redirecting without using an additional data block 30 on the disk 28, and (2) redirecting using an additional data block 30 on the disk 28.
In the first example of named redirecting without using an additional data block on the disk, the redirect information is contained within the collection object 18. In particular, the collection object 18 will include an extended attribute field 38 in the directory entry 20e (implemented, for example, by using the next directory entry location) that specifies the location 36 of the target according to the name “Target” specified in the name field. Hence, the directory entry 20e has no corresponding location (“inode”) 36 specified in the location field 26 or attribute in the attribute field 24 because there is no additional data block 30 allocated on the disk 28. However, any movement of the target file requires the extended attribute field 38 to be updated.
In the second example of redirecting using an additional data block 30 on the disk 28, a directory entry 20f specifies a redirect attribute (“R”) in the corresponding attribute field 24 and which specifies in the location field 26 a corresponding location 36 for a data block 30e that stores information (e.g., an “inode”) for reaching the target location (e.g., in the form of a text string). Hence, a “shortcut” in the Windows-based file system is an actual file 30e referenced by the directory entry 20f. In addition, the target specified in the file 30e may reference another volume (or device).
In both instances, however, the entries 20e or 20f need to be updated if the target 30e is moved from its location (“570”). Moreover, in the case of a named reference in data block 30e, if the device “X” in the string “\\X\Y\Z” was no longer available(e.g., device “X” was a computer and “Y\Z” was a redirect object to another device A having a file B), the target file would be deemed lost, even if only the redirect object was lost but the target file was still available.
FIG. 3 is a diagram illustrating a directory structure between two devices in a network. As illustrated in FIG. 3, the network 40 includes devices 42 (“User1”) and 44 (“Server”). The device 42 includes a local directory identifier 46 (“Z:”) that serves as a local substitution for the directory identifier “\DATA\User2\Shared” 48 that identifies a subdirectory 50 on the device 44. As illustrated in FIG. 3, the subdirectory 50 includes a file “File1” 52a. 
If the device 42 advertises the file “File1” 52a on the wide area network (e.g., the Internet) 54 using the expression “Z:\File1” 56, the file 52a might not be visible via the network 54 despite the visibility of the device 42, because the local directory identifier “Z:” 46 is no more than a local resolution within the device 42 of the name “Z:” to the location “\\Server\DATA\User2\Shared”. Hence, the local directory identifier “Z:” not a fully qualified path. Consequently, if the device 42 is no longer available, the file 52a is no longer accessible via the expression “Z:\File1” 56 even though the file 52a is still available in the device 44 via its fully qualified name “\\Server\DATA User2\Shared\File1”. If the device 44 is unavailable then the file 52a is not accessible via any path.
In addition, assume the device 42 had a fully qualified name “\\User1\Public\File2” for a locally-stored file 52b and that the device 44 had a fully qualified name “\\Server\DATA\User1\Shared\File2_Shortcut” for a shortcut file 52c that specified the fully qualified name “\\User1\Public\File2”. In this case, opening the file 52c results in retrieval of the fully qualified name “\\User1\Public\File2”. If the device 44 is unavailable, then the file 52b is still available via its fully qualified path “\\User1\Public\File2” if an accessing node already has the fully qualified path. Note, however, that accessing the files 52a and 52b still requires accessibility of the respective fully qualified paths “\\Server\DATA\User2\Shared\File1” and “\\User1\Public\File2” within their respective file systems, independent of the actual data files 52a and 52b. 
Consequently, there may be multiple paths to a file, but the accessibility to the file depends on the accessibility of the devices that provide context for the corresponding path.
Many remote file systems use only a file name and a dynamically-generated “handle” to refer to the file. For example, the Network File System (NFS) performs a lookup using a file name, where a handle to the file is returned; however, the handle is valid for only one server, and only for one session; further, a different client may receive a different handle for the same file. Plan 9 is similar to NFS in using a handle, with similar restrictions.
The Self-certifying File System (SFS) uses handles that statically map to specific servers, such that the location of the file referenced by the handle cannot be changed. The Cooperative File System (CFS) uses block identifiers, however the data that is referenced is static in that the block identifier is bound to the content of the object; hence, if a file needs to be added, removed, or modified, a new block identifier must be created for each corresponding modified block. CORBA also maps file names to handles, however the handles include the server address, resulting in the handle being necessarily tied to the storage location.
Hence, each of the aforementioned file systems rely on some relationship (context) between the referenced file and its referencing identifier (“handle”), where the relationship may be content, user, storage location, session, server identifier, etc.
Unlike directory entries, Microsoft has used registry entries that specify a globally available name and a 16-byte Globally Unique Identifier (GUUID): a query specifying the global name will return the GUUID. However, use of a GUUID requires: (1) accessing a registry to determine how to process a type of file (e.g., a .GIF file); (2) receiving from the registry a result specifying that a specific file handler should be used (e.g., a GIF file handler); (3) receiving from the specific file handler the GUUID for the specific file handler. The device must then search the registry for the GUUID to determine whether the GUUID corresponds to a local resource on the device, or whether the resources specifies a name of a remote device configured for processing the file. In addition, each device in a network is required to have a mapping of each GUUID its corresponding registry entry.