For convenient reference to stored computer data, the computer data is typically contained in one or more files. Each file has a logical address space for addressing the computer data in the file. In a typical general purpose digital computer or in a file server, an operating system program called a file system manager assigns each file a unique numeric identifier called a “file handle,” and also maps the logical address space of the file to a storage address space of at least one data storage device such as a disk drive.
Typically a human user or an application program accesses the computer data in a file by requesting the file system manager to locate the file. After the file system manager returns an acknowledgement that the file has been located, the user or application program sends requests to the file system manager for reading data from or writing data to specified logical addresses of the file.
Typically the user or application program specifies an alphanumeric name for the file to be accessed. The file system manager searches one or more directories for the specified name of the file. A directory is a special kind of file. The directory includes an alphanumeric name and an associated file handle for each file in the directory. Once the file system manager finds the specified name in the directory, it may use the file handle associated with the specified name for reading or writing data to the file.
For referencing a large number of files, the files typically are grouped together in a file system including a hierarchy of directories. Each file is specified by an alphanumeric pathname through the hierarchy. The pathname includes the name of each directory along a path from the top of the hierarchy down to the directory that includes the file. To locate the file, the user or application program specifies the pathname for the file, and the file system manager searches down through the directory hierarchy until finding the file handle. The file system manager may return the file handle to the user or application program with an acknowledgement that the file has been located. The user or application program includes the file handle in subsequent requests to read or write data to the file.
For exception reporting and error recovery, a reverse lookup may be performed through the directory hierarchy. For example, as described in Scheer U.S. Pat. No. 7,822,927, when responding to a read or write request specifying a file handle, the file system manager may find that the file is inaccessible in disk storage, so that an automatic recovery application may need the pathname of the file in order to recover from the error by accessing a backup copy of the file in backup storage. The pathname of the file is obtained by a reverse lookup of the parent directory that contains an entry specifying the file handle. To obtain the full pathname of the file, the reverse lookup process is repeated until reaching the root directory of the file system.
In the usual case, the conventional file system directory structure is not organized for an efficient reverse lookup, so there have been a number of proposals for modifying or augmenting the conventional directory structure to accelerate a reverse lookup. For example, Scheer U.S. Pat. No. 7,822,927 provides a directory name lookup cache (DNLC) with child hash lists and a mechanism for searching for a parent handle and a child name associated with a specified child handle by searching a child hash list indexed by a hashing of the specified child handle. This method has the advantage of working without modification of the conventional directory structure, but this method is limited to finding pathnames that are in the DNLC cache.
Proposals for modifying the conventional directory structure to accelerate a reverse lookup are described in Harmer et al. U.S. Pat. No. 7,228,299 and Passey et al. U.S. Pat. App. Pub. 2008/0046445. Harmer et al. recognizes that in UNIX systems, a directory file includes a second entry, ‘..’ or “dotdot”, that identifies the inode for the parent directory of that directory (except for the root directory), so that a reverse lookup from any directory up to the root directory may be performed to recover the pathname for any given directory. For a reverse lookup from a non-directory file, the inodes of non-directory files may include parent directory information (68 in FIG. 2 of Harmer et al.). To support multiple links, a file's inode may include multiple parent directory inode identifiers in the parent information. An inode's parent information may also indicate the current number of links to the associated file if it is desired to be able to return all pathnames to the file.
Passey et al. U.S. 2008/0046445 proposes to include, in the inode of a file (in FIG. 6A of Passey et al.), a list of parent identifiers and a link count for each parent of the inode. Passey et al. also shows the inode as including a reverse lookup hint, which is a hash value of a name for the file. The reverse lookup hint is used as a search key for searching a parent directory for the file.
The proposals for modifying the conventional directory structure to accelerate a reverse lookup indicate that there is a difficulty associated with finding all of the pathnames for a file when the file system supports multiple hard links. In general, a directory entry for a file is called a hard link to that file. When the file system supports multiple hard links, any file may have one or more hard links to it, either in the same or in different directories. Thus a file is not bound to a single directory and does not have a unique name. In other words, the file name is not an attribute of the file. Instead, in a UNIX-based file system, a file is uniquely specified by its inode number and its device ID, which are attributes of the file. To support multiple hard links, the file has an attribute called a link count, specifying the number of hard links to the file. The file continues to exist as long as its link count is greater than zero. The hard links to the file are equal in all ways and are simply different names for the same file. The file may be accessed through any of its hard links, and there is no way to tell which is the original hard link. See Uresh Vahalia, UNIX® Internals The New Frontiers, 1996, pp. 220-225, Prentice-Hall, Inc., Upper Saddle River, N.J.