The invention pertains to digital data processing and, more particularly, to the sharing of disk drives and other storage devices on a networked digital data processing system. The invention has application, for example, in the processing of video, graphics, database and other files by multiple users or processes on a networked computer system.
In early computer systems, long-term data storage was typically provided by dedicated storage devices, such as tape and disk drives, connected to a central computer. Requests to read and write data generated by applications programs were processed by special-purpose input/output routines resident in the computer operating system. With the advent of xe2x80x9ctime sharingxe2x80x9d and other early multiprocessing techniques, multiple users could simultaneously store and access dataxe2x80x94albeit only through the central storage devices.
With the rise of the personal computer and PC-based workstations in the 1980""s, demand by business users led to development of interconnection mechanisms that permitted otherwise independent computers to access one another""s storage devices. Though computer xe2x80x9cnetworksxe2x80x9d had been known prior to this, they typically permitted only communications, not storage sharing.
Increased power of personal computers and workstations is now opening ever more avenues for their use. Video editing applications, for example, have until recently demanded specialized video production systems. Now, however, such applications can be run on high-end personal computers. By coupling these into a network, multiple users can share and edit a single video work. Reservation systems and a host of other applications also commonly provide for simultaneous access to large files by multiple parties or processes. Still other tasks may require myriad small files to be accessed by multiple different parties or processes in relatively short or overlapping time frames.
Network infrastructures have not fully kept pace with the computers that they interconnect. Though small data files can be transferred and shared quite effectively over conventional network interconnects, such as Ethernet, these do not lend themselves, for example, to sharing of large files. Thus, although users are accustomed to seemingly instantaneous file access over a network, it can take over an hour to transfer a sixty second video file that is 1.2 GBytes in length.
Some interconnects permit high-speed transfers to storage devices. The so-called fiber channel, for example, affords transfers at rates of up to 100 MBytes/secxe2x80x94more than two orders of magnitude faster than conventional network interconnects. Although a single storage device may support multiple fiber channel interfaces, the industry has only recently set to developing systems to permit those workstations to share such files on a storage device. Moreover, when a file is to be accessed by multiple users, the overhead of server intervention can result in loss of speed advantages and efficiencies otherwise gained from the high-speed interface. In this regard, techniques such as locking, maintaining ghost files, monitoring file changes and underking multi-step access, check-in or housekeeping operations may be unworkable when multi-user access to many small files must be provided quickly.
In many situations, and for many specific types of networks, the coherence and security of a centralized shared access system are desirable, but the nature of their storage transactions may be ill-suited to permitting shared access due for example, to the burden imposed by file management protocols for tracking files, versions, and file size changes, and so forth.
In view of the foregoing, an object of the invention is to provide improved digital data processing systems and, particularly, improved methods and apparatus of high-speed access to data in storage devices on a networked computer system.
A related aspect of the invention is to provide such systems that achieve fast operation with files of diverse sizes.
A related aspect of the invention is to provide such systems as can be implemented with minimum cost and maximum reliability.
Yet another object of the invention is to provide such systems as can be readily adapted to pre-existing data processing and data storage systems.
Yet still another object of the invention is to provide such systems as can be readily integrated with conventional operating system software and, particularly, conventional file systems and other input/output subsystems.
One or more of the foregoing and other desirable objects are attained by the invention, which provides novel term- or lease-based methods and apparatus for accessing shared storage on a networked digital data processing system.
A system according to one aspect of the invention includes a plurality of digital data processing nodes and a storage device, e.g., a disk drive, a xe2x80x9cjukebox,xe2x80x9d other mass storage device or other mapped device (collectively referred to herein after as xe2x80x9cdisk drive,xe2x80x9d xe2x80x9cstorage devicexe2x80x9d or xe2x80x9cperipheral devicexe2x80x9d). First and second ones of the nodes, which may be a client and a server node, respectively, are coupled for communication over a LAN, network or other communications pathway. Both the first and the second nodes are in communications with the storage device. This can be over the same or different respective logical or physical communications pathways.
By way of non-limiting example, the first node and the second node can be a client and a server, respectively, networked by Ethernet or other communications media, e.g., in a wide area network, local area network, the Internet interconnect, or other network arrangement. The server and/or client can be connected to the storage device via a SCSI channel, other conventional peripheral device channel, such as a fibre channel, xe2x80x9cfirewirexe2x80x9d (i.e., IEEE 1394 bus), serial storage architecture (SSA) bus, high-speed Ethernet bus, high performance parallel interface (HPPI) bus or other high-speed peripheral device bus.
A file system or other functionality in the second (server) node receives and responds to at least selected requestsxe2x80x94e.g., file OPEN requestsxe2x80x94from the first (client) node for access to a file on the storage device, by generating a xe2x80x9cleasexe2x80x9d. The lease includes a block map or other administrative data (referred to elsewhere herein as xe2x80x9cmeta dataxe2x80x9d) for the requested file, as well as an expiry time indicating how long the administrative data is valid.
Upon grant of the lease, the client node accesses the storage device using the block map or other administrative data supplied with the lease. The server node assures that this administrative data remains valid for the period of the lease, e.g., such that list and order of blocks comprising the file does not changexe2x80x94e.g., shrink, disappear or become reassigned to other files, during the client""s use of the file. Correspondingly, the client node ceases utilization of the administrative data (and, presumably, ceases at least direct access of the file) after lease expiry.
Related aspects of the invention provide a system as described above in which lease expiry is keyed to the time of the initial client request. Hence, both the client and server nodes can accurately determine lease expiry time by reference to their own clocks; network time synchronization is therefore not necessary for effective operation of the system.
Further aspects of the system provide a system as described above in which the client issues a request for read-only or a read/write access to the file, and in which the server node grants a corresponding lease. For read/write leases, the server effects defragmentation, clean-up or other administration of the file once the lease has expired, e.g., via the server file system or via a file management system or controller on the storage device. The server node can also monitor activity by read/write xe2x80x9cleaseholders,xe2x80x9d e.g., for rapid notification of meta data changes upon expiry of the lease.
Typically, no such administration is required at termination of a read-only lease, since the leaseholder makes no changes to the file. Thus, for read-only access requests that may often constitute the vast preponderance of file requests, only the initial request and lease grant messages are necessary to allow quick and unhindered file access.
Still further aspects of the invention provide a system as described above in which the leases are self-expiring. Client node leaseholders in such a system need not report back to the server when a lease is expired and/or the file is closed. This reduces the number messages required to be communicated over the network between nodes, while providing direct access to file storage and continuing file security and coherence.
In an alternate aspect of the invention, the client node tracks lease expiry time and assures that all file writes are completed and changes in end-of file pointers are reported to the server node before lease expiry. Thus, only in this circumstance is a second or further communication over the network between the client and server required. The client node file application also assures that the leased data map is not referenced after lease expiry, and may, for example cleanse its cache of stale data.
Yet other aspects of the invention provide systems as described above in which the server node employs a decision table to set lease intervals, based, for example on whether the request for file access is a read-only or read/write request, as well as on current average network transaction times, requested file size, number or types of outstanding unexpired leases for the requested file, and the like. For example a request to read a file under 50 kbytes may be automatically granted a ten-second, or a two-minute, lease, while large file read/write requests may be granted leases on the order of minutes or hours.
Yet other aspects of the invention provide a system as described above in which the server node locks and unlocks data blocks in order to assure compliance with leases by the client node and other nodes. According to these aspects of the invention, when a lease is granted for a set of data blocks, they are locked; when the lease expires, those data blocks are unlocked.
Monitoring of lease activity is facilitated, in related aspects of the invention, by the server node""s maintenance of a list of outstanding (or unexpired) leases. The server uses this, for example, to track and control any file size changes. When all leases have expired for at least a given set of blocks, the server can issue an unlock message to the file management system for those blocks and permit them to be defragmented or otherwise administered to.
Still further aspects of the invention provide systems as described above in which the client ode accesses leased files directly, without intervention by the server node. In this regard, the latter unctions as an xe2x80x9cauthorizerxe2x80x9d (i.e., insofar as it grants leases which effectively authorize access to a file) and not as a xe2x80x9cserverxe2x80x9d per se (though, of course, it can function in the latter role as well). In related aspects, the server may be implemented as a layer over a native file management system in the storage device, interfacing with a native file system meta data controller (FSMDC).
Further aspects of the invention provide a system as described above operating with a shared storage file management system, for example, as described in the aforesaid United States patent, or with other conventional network file server system. The allows the client nodes to access the file system without extraneous network communications for most file access tasks while it may employ the network file server system to require access through a server or file management system for a limited number of file requests for large files, files with outstanding leases, and in situations where security, coherence or file integrity concerns are primary.
Further aspects of the invention provide systems as described above including multiple xe2x80x9cclientxe2x80x9d nodes, one or more xe2x80x9cserver nodesxe2x80x9d and one or more storage devices, all operating as described above.
Still further aspects of the invention provide methods of operating digital data processing systems paralleling the operations described above.