In our modern communication age, business entities and consumers are storing an ever increasing amount of digitized data. For example, many entities are in the process of digitizing their business records and/or other data. Similarly, web based service providers generally engage in transactions that are primarily digital in nature. Thus, techniques and mechanisms that facilitate efficient and cost effective storage of vast amounts of digital data are being implemented.
Clustered computing networks provide users with an ability to quickly and efficiently manage vast amounts of digital data in a cost effective manner. In particular, clustered computing networks offer users the ability to store and access multiple versions of data from a parent volume (e.g., a traditional or flexible volume). For example, from a parent volume, users can generate a child volume (e.g., a flexible copy of the parent volume, where the flexible copy is based on a snapshot copy of the parent volume at some instant in time). When a plurality of child volumes (“sibling volumes”) are initially generated from the same snapshot copy, the files in the sibling volumes are initially bit-for-bit replicas of each other and of the snapshot copy. The files in the sibling volumes remain bit-for-bit replicas until a file is modified on one or more of these sibling volumes.
In addition, subsequent snapshot copies can be created based on child volumes, thereby generating “grandchild” volumes, wherein any given file within a grandchild volume can remain a bit-for-bit replica of the parent and child volumes until a change is made to a file on one of the volumes. In this manner, corresponding files between parent, child, grandchild, great-grandchild, cousin, and possibly between all relatives in a hierarchy of related volumes can be bit-for-bit replicas of one another, although these corresponding files will tend to diverge over time as changes are made to the files on the volumes.
When a first volume and a second volume are related in this way (e.g., a first volume is a parent volume and a second volume is a child volume (or vice versa)), a first application on the client (or a first instance of the first application) can initially access a file from the first volume (e.g., the parent volume). The first application can cause the file to be locally cached as one or more data blocks stored in a memory in the client (e.g., a client-side cache). At this time within the client memory, the data blocks of the file are associated with a first volume identifier (e.g., an inode number, generation number, and/or file system identifier (FSID)), which corresponds to the first volume.
A second application on the client (or a second instance of the first application) can likewise access the same file, but from the second volume (e.g., the child volume) in the clustered computing network (e.g., due to preprogramming in the second application indicating that the file is available from the second volume). Like the first application, the second application can cause the file to be locally cached as one or more data blocks stored in the client memory. Thus, at this time, a second copy of the data blocks for the file can be stored in client memory but associated with a second volume identifier. Alternatively, a single copy of the data blocks of the file can be stored in the client memory but associated with both first and second volume identifiers.
It can be appreciated that the client device may once again need this file for subsequent operations. However, because the client memory has only limited space, at some subsequent time the file may no longer be stored in the client memory with both the first and second volume identifiers. This can cause client-side caching inefficiencies. For example, if the first application has not accessed the file in the first volume for an extended time period, the file in the first volume may be marked as “stale” in the client memory. Thus, where two copies of the data blocks of the file are stored in client memory (e.g., one copy associated with the first volume identifier and the other copy associated with the second volume identifier), this may cause the copy of the data blocks associated with the first volume identifier to be disposed of or overwritten in client memory. Alternatively, where a single copy of the data blocks are stored in the client memory but associated with both first and second volume identifiers, this may cause the file's data blocks to be disassociated from the first volume identifier in client memory.
In this instance, if the first application were to request the file according to the first volume identifier after the extended time period, the file would no longer appear as locally available since the file is no longer associated with the first volume identifier in client memory. However, unbeknownst to the first application, the data blocks of the file may still reside in the client memory, albeit associated with only the second (different) volume identifier. Once again, this is because the file as stored in the first and second related volumes tends to be a bit-for-bit replica in the volumes until the file is modified in one of the volumes.
Thus, if the first application were to request the file according to the first volume identifier in this example, the client will send a request to retrieve the file from the first volume in the clustered computing network, rather than retrieving it from the client memory. It can be appreciated that this is an inefficient operation since it requires more resources to go out and fetch data from an external source than to obtain it locally from the client (e.g., from the client-side cache). Therefore, the inventors have appreciated that previous solutions unduly reduce the cache-hit ratio for the client, and correspondingly reduce system performance.