1. Field of the Invention
This invention relates generally to distributed file systems and, more particularly, to an architecture and implementation of a real-time distributed file system.
2. Discussion of Related Art
Advances in networking and storage technology, along with the digitization of multimedia streams have created the need for large and fast servers. The servers are typically used as repositories connected to a network. Multiple client hosts are able to use them online over the network. The clients mount the file system on their hosts and use the server functionalities seemlessly.
Multimedia servers can be of three types: centralized, distributed or serverless. In a centralized server a single dedicated node controls the admission process as well as all other file operations and security issues. In a distributed server environment, a set of designated nodes shares the load and functions of the server. In a serverless system, all the clients and the storage devices are connected directly to the network.
Generally, a distributed file system that is implemented in a server environment includes a distributed directory structure that is independent of the file system associated with the individual computers. The distributed directory structure is replicated and stored on the individual computers. The overhead associated with replicating and storing the distributed directory structure is large, and this degrades the performance of the overall file system.
Additionally, conventional distributed file systems lack a method for bandwidth access control. Therefore, as clients increase the number of accesses to the file system, increasing demands are placed on the system resources of the file system, resulting in an inability to support real-time applications.
Accordingly, there exists a need for an improved method of implementing a distributed file system. The system should reduce the overhead associated with storing the distributed file system directory structure and with storing the application data. The system should also increase the performance of the distributed file system, and it should provide for scalability of the storage system. The system should also be independent of the network and its protocols. There also exists a need for a real-time distributed file system.
The present invention provides a distributed file system for storing and retrieving information to and from one or more storage systems over a network by one or more host systems. The preferred storage system is a device we call the autonomous disk (AD). The AD is a disk or other storage medium that has an associated processing engine. Because the file system advantageously places low processing demands upon this processing engine, the AD can be implemented using a relatively small, low cost processor.
The file system of the invention comprises a storage system kernel or agent residing on the AD storage system. The storage system kernel includes a free list management system that determines the physical storage location of information stored on said storage system. The file system works in conjunction with a directory structure system residing on the host system that defines a logical organization of a plurality of files corresponding to information stored on said storage system. The file system can be implemented using an existing file system associated with the host, if desired.
A legacy attribute data store, coupled to said network, stores meta-data associated with said information stored on said storage system. Host systems can access this meta-data to determine the physical storage location of the information stored on the ADs.
The file system further comprises a client kernel or agent residing on said host system that has access to meta-data from said legacy attribute data store. The client agent is interoperative with the directory structure system to associate the plurality of files with corresponding physical storage locations. Using this information a host may retrieve information from the storage system for delivery to the host system over the network.
Autonomous disks employed in a presently preferred embodiment of the invention provide flexibility in designing a file server. They can be used to build a distributed file system by delegating tasks among multiple ADs. A serverless file system can be implemented by performing file system operations in the AD. It is also possible to build a security module into an AD to prevent unauthorized use of the system. An AD can be implemented using different hardware and software means.
The distributed file system (DFS) architecture described in this invention uses the AD as a building block. The DFS has a distributed architecture with a number of storage devices connected over a network. The user hosts are also connected to the same network. One of the user hosts, called the configuration manager, is equipped to maintain distributed DFS-specific data structures, system configurations and provide access control. The kernel of the DFS is distributed across the autonomous disks, the user hosts and the configuration manager. The kernel makes the underlying operations of the system transparent to the users.
The AD is a disk or other storage medium with a small programmable memory, and it can be implemented through active network-attached disks, regular workstations or other means. The AD performs some lightweight file system related functions, and these functions are performed as a part of the DFS kernel running at the disk. It also has a network interface that allows it to connect directly to the network.
DFS data is preferably organized in volumes. Each volume consists of one or more autonomous data disks, a type of autonomous disk. A data file is striped across the data disks of the volume. The file system meta-data for the volume is stored in another autonomous disk called the legacy attribute disk (LAD). The distributed file system directory structure is stored on the LAD using its native file system. This scheme allows the DFS to treat the control mechanisms and data separately, thereby reducing overhead. The file system supports real-time applications and provides scalable data storage.
The above described system is only an example. Systems in accordance with the present invention may be implemented in a variety of ways.