1. Field of the Invention
The present invention relates to a method of path name resolution for use in a distributed system formed of a plurality of data processing systems interconnected by a network.
2. Description of the Prior Art
In recent years, in the area of data processing, there has been considerable development of distributed systems. A distributed system basically consists of a number of data processing systems, referred to in the following for brevity of description as nodes, which are interconnected to form a network. Each node basically consists of a processor together with a main memory (i.e. random access memory formed of semiconductors) having high access speed and limited storage capacity, and a secondary memory (such as a magnetic "hard disk") having relatively slow access speed and high storage capacity. In the following, the secondary memory will be referred to as the "disk" of a node.
With such a distributed system, files and directories are distributed throughout the network i.e. are stored on disk at various nodes of the network. Thus it will frequently happen that a file for which access is required by the user of one node of the system (the "client" node) is resident at some other node (the "server" node). In that case it is necessary for the client node to execute communication via the network, to access the file in the server node.
The directories serve to map respective files (or other directories, i.e. subdirectories) to information to be used in locating the files or subdirectories. That is to say, if a file is not named in the root directory of the directory system, but in a subdirectory which may be a "child", "grandchild", etc. descendant from the root directory, then in order to access that file it is necessary to specify the particular subdirectory in which the file is listed as a component name. That is done by entering a path name for the file, which sequentially lists the directories which must be successively searched in order to finally obtain location information for the desired file.
The procedure executed by the system to obtain the required file location information, by using such a path name, is referred to as path name resolution. In the case of a distributed system, problems arise due to the fact that directories, as well as files, are distributed throughout the nodes of the system, rather than being all resident at a single data processing system.
That point will be described referring to FIGS. 1 and 2. In FIG. 1, which represents a part of a distributed system, three nodes of the system, designated by numerals 80, 81 and 82 are will be referred to as nodes 1, 2 and 3 respectively. Numeral 83 designates a file, having the file name "c", which is stored on the disk of node 3. It will be assumed that a user of node 1 requires to access the file c. The path name which the user must input to node 1 in order to access the file c is "/a/b/c". This signifies that to obtain the desired location information for file c, the root directory (generally designated by "/") must be searched to find the component name "a", to obtain location information for a subdirectory /a. That subdirectory must then be searched for a component name "b", to obtain location information for a subdirectory /a/b. The subdirectory /a/b must then be searched for a component name "c", to obtain location information for the desired file. That process of path name resolution is illustrated in FIG. 2, in which it is assumed that the root directory is stored at node 1, the directory /a at node 2, and the subdirectory /a/b at node 3, these directories being respectively designated by numerals 90, 91 and 92. For simplicity, the necessary location information for the directories /a, /a/b, and file c are respectively indicated as "2", "3" and "4".
After obtaining the location information for subdirectory /a by searching the root directory, a netowork access must be performed (as indicated by numeral 90) before searching the subdirectory /a to obtain location information for the subdirectory /a/b. Another network access 93 must then be performed, before searching for location information for the desired file c can be completed. It has thus been necessary to execute two network accesses in order to achieve path name resolution, in this simple example. In practice, the number of network accesses required to execute path name resolution could be substantially greater.
As a result, system performance is degraded, due to the network being frequently accessed for the purpose of path name resolution. In addition, if any of the intermediate nodes which must be accessed to perform path name resolution is temporarily inoperative, then path name resolution cannot be achieved. Thus, overall system reliability is reduced.
Each directory consists of a list of component names, which are mapped to location information for the corresponding components (directories or files). Since each directory is identified by a name, in the same way as for a file, directories can be accessed in the same way as files. The term "resident directory" of a node as used herein signifies an original directory of the node, having, as component names of entries, names of files which are currently recorded on disk at that node, as well as names of subdirectories, i.e. "descendant" directories of itself. These subdirectories may be resident at that node, or may be resident at other nodes of the system. The resident directories of a node consist of at least a root directory (to be distinguished from the term "root directory of a path name" as used herein for the first directory of a path name) and may also include one or more of the aforementioned resident subdirectories.
A resident directory of one node may also be replicated on disk at some other node, since it may be convenient to be able to locally search such a replicated directory at the other node.
Various proposals have been made in the prior art for reducing the number of network accesses which are necessary to achieve path name resolution in a distributed system. One method is to use a name cache (sometimes called a directory cache), at each node, i.e. a table which is set up within a region of the main memory of a node, which relates various frequently-utilized path names to the locations of the corresponding files within the distributed system. By using such a name cache, it becomes unnecessary to execute the multiple network access described above in order to achieve path name resolution, so that the system performance can be substantially improved. The structure of such a name cache is illustrated in FIG. 3. As shown, this consists of a set of entries 95, 96, etc., each relating a path name to object location information which specifies the location within the distributed system of a file which is specified by the path name. Such object location information will basically consist of information to indicate the node at which the file is stored, and information for locating the file at that node. The first time that path name resolution is executed for a file, it is performed by the conventional node-by-node directory access method described hereinabove referring to FIG. 1. When the location information for the desired file within the distributed system has thus been obtained, it is written into an entry of the name cache, in conjunction with the corresponding path name, as shown in FIG. 3. Thereafter, when a user wishes to access that file, it is only necessary for the system to read out from the name cache the file location information which corresponds to the path name for the file, whereupon the node at which the file is stored can immediately be directly accessed via the network, and the file located at that node, without the need to access any intermediate nodes.
However since the amount of main memory available at each node is limited, the size of the memory region available for such a name cache must be small. Hence, when the maximum number of entries of the name cache is exceeded, it becomes necessary to perform replacement processing at each node, thereby deleting one or more entries to make room for new entries. Various types of replacement algorithms have been proposed for that purpose, however such methods have the basic disadvantage that the system users are not aware of the current contents of the name cache. Hence it is impossible for a user to forecast the amount of time which will be required to access any specific file within the distributed system, since there may be a very large difference between the access time for a file when the name cache is utilized and the access time for that file when the node-by-node method of path name resolution shown in FIG. 2 is applied. Due to that fact, it is difficult to use a name cache in a real-time processing system.