1. Field of the Invention
The present invention relates to a computer file system and method, wherein one or more characteristics (e.g., actual data contained in the file, and/or meta-data associated with the file, such as file name/handle, ownership, related links, size, time of last modification, user access privilege-related data, etc.) of a data file maintained by a computer process (e.g., residing in a network computer node) may be accessed and modified by multiple other computer processes (e.g., residing in computer network client nodes), and a mechanism exists to maintain the coherency of the data file and its characteristics despite their being subject to modification by the multiple processes in the network. As used herein, the term “data file” comprises objects in a distributed computer file system, such as user and system program and data files, directories, and associated objects. Also as used herein, the “modification” of a data file may comprise the creation of the data file.
2. Brief Description of Related Prior Art
Data communication in a computer network involves data exchange between two or more entities interconnected by communication links. These entities are typically software program processes executing on computer nodes, such as endstations and intermediate stations. Examples of an intermediate station may be a router or switch which interconnects the communication links and subnetworks to enable transmission of data between the endstations. A local area network (LAN) is an example of a subnetwork that provides relatively short distance communication among the interconnected stations, whereas a wide area network enables long distance communication over links provided by public or private telecommunications facilities.
Communication software executing on the endstations correlate and manage data communication with other endstations. The stations typically communicate by exchanging discrete packets or frames of data according to predefined protocols. In this context, a protocol consists of a set of rules defining how the stations interact with each other. In addition, network routing software executing on the routers allow expansion of communication to other endstations. Collectively, these hardware and software components comprise a communications network and their interconnections are defined by an underlying architecture.
Modern communications network architectures are typically organized as a series of hardware and software levels or “layers” within each station. These layers interact to format data for transfer between, e.g., a source station and a destination station communicating over the network. Predetermined services are performed on the data as it passes through each layer and the layers communicate with each other by means of the predefined protocols. The lower layers of these architectures are generally standardized and are typically implemented in hardware and firmware, whereas the higher layers are generally implemented in the form of software running on the stations attached to the network. In one example of such a communications architecture there are five layers which are termed, in ascending interfacing order, physical interface, data link, network, transport and application layers. These layers are arranged to form a protocol stack in each communicating station of the network. FIG. 1 illustrates a schematic block diagram of prior art protocol stacks 125 and 175 used to transmit data between a source station 110 and a destination station 150, respectively, of a network 100. As can be seen, the stacks 125 and 175 are physically connected through a communications channel 180 at the interface layers 120 and 160. For ease of description, the protocol stack 125 will be described.
In general, the lower layers of the communications stack provide internetworking services and the upper layers, which are the users of these services, collectively provide common network application services. The application layer 112 provides services suitable for the different types of applications using the network, while the lower interface layer 120 accepts industry standards defining a flexible network architecture oriented to the implementation of LANs.
Specifically, the interface layer 120 comprises the physical interface layer 126, which is concerned with the actual transmission of signals across the communication channel and defines the types of cabling, plugs and connectors used in connection with the channel. The data link layer (i.e., “layer 2”) 121 is responsible for transmission of data from one station to another and may be further divided into two sublayers: Logical Link Control (LLC 122) and Media Access Control (MAC 124).
The MAC sublayer 124 is primarily concerned with controlling access to the transmission medium in an orderly manner and, to that end, defines procedures by which the stations must abide in order to share the medium. In order for multiple stations to share the same medium and still uniquely identify each other, the MAC-sublayer defines a hardware or data link address called a MAC address. This MAC address is unique for each station interfacing to a LAN. The LLC sublayer 122 manages communications between devices over a single link of the network.
The network layer 116 (i.e., “layer 3”) provides network routing and that relies on transport protocols for end-to-end reliability. An example of a network layer protocol is the Internet Protocol (“IP”). An example of such a transport protocol is the Transmission Control Protocol (TCP) contained within the transport layer 114. The term TCP/IP is commonly used to refer to the Internet architecture. (See, e.g., Tanenbaum, Computer Networks. Third Ed., Prentice Hall PTR, Upper Saddle, River, N.J., 1996).
Data transmission over the network 100 therefore consists of generating data in, e.g., sending process 104 executing on the source station 110, passing that data to the application layer 112 and down through the layers of the protocol stack 125, where the data are sequentially formatted as a frame for delivery onto the channel 180 as bits. Those frame bits are then transmitted over an established connection of channel 180 to the protocol stack 175 of the destination station 150 where they are passed up that stack to a receiving process 174. Data flow is schematically illustrated by solid arrows.
Although actual data transmission occurs vertically through the stacks, each layer is programmed as though such transmission were horizontal. That is, each layer in the source station 110 is programmed to transmit data to its corresponding layer in the destination station 150, as schematically shown by dotted arrows. To achieve this effect, each layer of the protocol stack 125 in the source station 110 typically adds information (in the form of a header) to the data generated by the sending process as the data descends the stack.
For example, the network layer encapsulates data presented to it by the transport layer within a packet having a network layer header. The network layer header contains, among other information, source and destination (logical) network addresses needed to complete the data transfer. The data link layer; in turn, encapsulates the packet in a frame that includes a data link layer header containing information required to complete the data link functions, such as (physical) MAC addresses. At the destination station 150, these encapsulated headers are stripped off one-by-one as the flame propagates up the layers of the stack 175 until it arrives at the receiving process.
A computer file system controls the formatting of data files, maintaining the location of the data files in memory, the logical hierarchy of data files, user/process access privileges (e.g., in terms of reading and writing) to the data files, and other file-related tasks, such as house-keeping and administrative functions that keep track of data file statistics (e.g., sizes of the files, dates of creation and last modification of the files, etc.). Computer file systems are frequently integrated with the operating system such that, although a logical or functional distinction may be made between the two systems, they are intertwined with each other from a source code standpoint. When the processes that implement the file system reside in multiple nodes in a computer network, that file system may be termed a “distributed” computer file system.
A “client/server network” is one conventional type of computer network architecture wherein data files stored or residing in one computer node (commonly termed a “server” computer) in the network are shared, using a distributed computer file system, by multiple processing executing/residing in other computer nodes (commonly terms “client” computers) in the network. That is, data files and their characteristics stored or residing in the server computer node may be accessed and modified, via the distributed file system, by multiple processes executing/residing in the client computer needs.
The client/server network architecture offers advantages over other types of network architectures. For example, since in a client/server network, data files residing in the server computer node may be accessed by processes residing in the client computer nodes, copies of these files need not also reside in the client nodes. This increases the amount of client computers' resources that may be made available for other purposes, and eliminates the cost and time necessary to support and maintain separate copies of these files in the client computers.
In distributed file systems, maintaining the coherency of data files and file characteristics shared among, and subject to modification by multiple processes residing in the client nodes can be problematic. That is, since multiple processes residing in the client nodes may be able to access and modify the characteristics of data files stored in the server node, it becomes necessary for the file system to ensure coherency of these characteristics despite their being subject to modification by the multiple processes.
In one conventional solution to this problem, a file system management process residing in the server node grants sets (i.e., combinations) of different types of “tokens” to requesting client node processes that grant permission to the processes to modify particular characteristics of files stored in the server node. Each “token” is identified by the particular class/type to which it belongs, and is associated with a respective data file. In order for a process to be able to execute a respective modification to a respective data file characteristic, the process must first be granted permission by the network server's file management process, in the form of a grant of a respective set of different types/classes of tokens associated with that file and the modification.
More specifically, according to this prior art technique, when a client node process desires to modify a respective characteristic of a respective data file stored in the server node, the process transmits separate respective token grant request messages to the server node's file management process that request grant of each of tokens in the predetermined set of tokens required for permission to make the desired modification. In response to each respective request message, the file management process determines whether the respective token whose grant is being requested by the respective request message is available for grant to the client node process. If the respective token is available for grant, the file management process transmits a token grant message to the client process that grants that respective token to the client process. Conversely, if the respective token is not available for grant, for example, as a result of being currently granted to another client node process, the file management process may transmit a token revocation message to the other client node process to which the respective token is currently granted. In response to the token revocation message, the other client node process forwards to the file management process a token relinquishment message indicating that the other client node process has relinquished its grant of the respective token, thereby returning the respective token to the pool of tokens available for grant to the requesting client node process. The file management process may then transmit the token grant message to the client process. A client node process may execute a desired modification to a respective data file only after, and for as long as, the process has been granted the respective set of tokens required to make the desired modification.
Although this prior art technique is able to maintain the coherency of data file characteristics stored in the server node, it has certain disadvantages and drawbacks. For example, since only a single respective token may be requested and granted in each token request and grant message, respectively, when the set of tokens required for a desired file characteristic modification comprises more than one token, multiple token request and grant messages must be exchanged between the file management and requesting client node processes to enable the requesting client node process to carry out the desired file characteristic modification. Likewise, since only a single respective token may be revoked and relinquished in each token revocation and relinquishment message, respectively, if multiple tokens must be revoked and relinquished to enable the desired modification to take place, multiple token revocation and relinquishment messages must be exchanged to effect the revocation and relinquishment of such multiple tokens. Thus, since, at any given time, many client node processes may seek to modify, and may presently be engaged in modification of, characteristics of frequently-used data files stored in the server node, this can result in an undesirably large amount of network bandwidth being consumed by tasks related to network file system overhead, and can undesirably increase network congestion.