In the early days of computing, a computer only had access to data that was on that computer's local disk. As time progressed, computers were connected together using networks. Thus, there was a possibility that each computer could access data on other computer's local disks (hereinafter referred to as “decentralized file systems”). FIG. 1 is a block diagram of a decentralized file system with three computers (110, 120, and 130). Each of these three computers is associated with a corresponding local disk (111, 121, 131). Using a decentralized file system mechanism, computer 110 can access data on disk 131, which is associated with computer 130.
However, the problem with decentralized file systems is that managing the data is difficult because the data is spread out. Furthermore, decentralized file systems make file sharing difficult. For example, a department may have files that each person in the department needs to update on a regular basis. If these files are spread among various computer systems, all of those various systems must constantly remain online for the files to be available. Furthermore, because of the critical nature of these files, these files need to be backed up on a regular basis. Backing up files that are spread across many systems can by a complex, and time and resource-consuming, operation. A decentralized file system is not convenient for file sharing or managing files.
The next step in the evolution of file systems was to centralize important data onto one or more centralized shared disks (hereinafter referred to as a “centralized file system”). FIG. 2 is a block diagram of a centralized file system. The shared files reside on the shared disk 240. The computers (210, 220, and 230) have access to the files on shared disk 240. This is a more convenient topology for sharing files among many people and for managing files, for example, by backing up the files. However, there are also problems with centralized file systems. For example, if the bandwidth of the connection 250 between the shared disk 240 and the computer 210 is low, then computer 210 will have slow access to the files on shared disk 240. Worse yet, if connection 250 is down, computer 210 will not be able to access files on shared disk 240 at all.
A third approach to file sharing is where not only do computers share data using shared disks but also the data on the shared disks is replicated. The replicated data is “pushed” to local disks associated with the individual computers (hereinafter referred to as a “hybrid file system”). FIG. 3 is a block diagram of a hybrid file system. The computers (310, 320, 330, 350, and 360) share files that are on the shared disks (341 and 342), but do so by accessing the replicated versions of those files on their local disks (311, 321, 331, 351, and 361).
To address potential problems with speed of access and connections going down, the data on the shared disks (341 and 342) is periodically replicated and this replicated data is pushed to the individual computers (310, 320, 330, 350, and 360) that are interested in the replicated data. Then the replicated data is stored on the local disks (311, 321, 331, 351, and 361). Policies are used to determine which computers are interested in what shared files and at what intervals the shared files of interest are to be transmitted to the interested computers. For example, assume there is a slide-presentation file on shared disk 341 that computer 350 is interested in. A policy is set up indicating that this slide-presentation file on shared disk 341 is replicated every 24 hours and sent to computer 350 where the slide-presentation file is then stored on the local disk 351.
Such a hybrid file system addresses the problem of access speed because computer 350 accesses the local version. However, other problems are associated with hybrid file systems. For example, the slide-presentation file on shared disk 341 may be transmitted to computer 350 at 2:00 AM and stored locally on disk 351. Then, at 2:05 AM on the same day, someone modifies the slide-presentation file on shared disk 341. The person using computer 350 will be using an old version of the slide-presentation file until the next time the slide-presentation file is transmitted to computer 350. Furthermore, a hybrid file system is not an efficient use of resources such as CPU cycles or storage space. Additional CPU cycles are needed to replicate data and transmit the replicated data to the interested parties. Additional storage space is needed to store the various versions of replicated data on all the computers that are interested in this replicated data. The problem of connections having low bandwidth or going down still exists. Policies for replicating the shared data are also needed, increasing the complexity of setup and maintenance.
Based on the foregoing, it is clearly desirable to provide techniques and mechanisms for sharing data that avoids problems associated with current approaches.