Caching is a common technique in computer systems to improve performance by enabling retrieval of frequently accessed data from a higher-speed cache instead of having to retrieve it from slower memory and storage devices. Caching occurs not only at the level of the CPU itself, but also in larger systems, up to and including caching in enterprise-sized storage systems or even potentially globally distributed “cloud storage” systems. Access to cached information is faster—usually much faster—than access to the same information stored in the main memory of the computer, to say nothing of access to information stored in non-solid-state storage devices such as a hard disk.
On a larger scale, dedicated cache management systems may be used to allocate cache space among many different client systems communicating over a network with one or more servers, all sharing access to a peripheral bank of mass-storage devices. This arrangement may also be found in remote “cloud” computing environments.
Data is typically transferred between memory (or another storage device or system) and cache as cache “lines”, “blocks”, “pages”, etc., whose size may vary from architecture to architecture. In systems that have a caching hierarchy, relatively fast memory (such as RAM) may be used to cache slower memory (such as storage devices). Just for the sake of succinctness, all the different types of information that are cached in a given system are referred to commonly here as “data”, even if the “data” comprises instructions, addresses, metadata, etc. Transferring blocks of data at a time may mean that some of the cached data will not need to be accessed often enough to provide a benefit from caching, but this is typically more than made up for by the relative efficiency of transferring blocks as opposed to data at many individual memory locations; moreover, because data in adjacent or close-by addresses is very often needed (“spatial locality”), the inefficiency is not as great as randomly distributed addressing would cause. A common structure for each entry in the cache is to have at least three elements: a “tag” that indicates where (generally an address) the data came from in memory; the data itself; and one or more flag bits, which may indicate, for example, if the cache entry is currently valid, or has been modified.
Regardless of the number, type or structure of the cache(s), the standard operation is essentially the same: When a system hardware or software component needs to read from a location in storage (main or other memory, a peripheral storage bank, etc.), it first checks to see if a copy of that data is in any cache line(s) that includes an entry that is tagged with the corresponding location identifier, such as a memory address. If it is (a cache hit), then there is no need to expend relatively large numbers of processing cycles to fetch the information from storage; rather, the processor may read the identical data faster—typically much faster—from the cache. If the requested read location's data is not currently cached (a cache miss), or the corresponding cached entry is marked as invalid, however, then the data must be fetched from storage, whereupon it may also be cached as a new entry for subsequent retrieval from the cache.
There are two traditional methods for tagging blocks in a cache. One is to name them logically, such as using a Logical Block Address (LBA) within some data object, file, virtual disk, or other logical entity. One drawback of this method is that when a remote host asks for the block at, say, LBA 18, it is difficult to determine if the block for LBA 18 that the remote host has is current or has been overwritten with new content. This problem of ensuring consistency is especially hard in the face of failures such as a host going out of communication for a while.
The second approach is to name blocks by their storage location. Traditional systems which update data in place have the same consistency issue as with LBA-tagged arrangements. Log structured file systems are better in this second case because new content would have been written to a new location, such that if a block stored at address X is needed and the remote host has that block, the correct data will be referenced. But, if the block has been moved, for example as part of a garbage collection process, its storage location will change and although the remote cache may have the correct data, the address will be wrong. The host will therefore reply that it does not have the data, when it actually does.
A third, more recent approach, is to tag data by its content, sometimes called a content-addressable cache. In this approach, the tag depends only on the content of the data, such as, for example, a SHA-1 cryptographic fingerprint of the data.
Three issues commonly arise when considering the design of a caching system. The first issue is memory hierarchy: Each memory technology represents a different choice on the cost-performance tradeoff spectrum—faster memory tends to be more expensive than slower memory. Host caches will tend to be made of faster, but more expensive memory. Accessing data from this faster memory as often as possible will make the VMs go faster.
The second issue is proximity: Data in a local cache can be accessed more quickly than data stored in a remote system. Each host therefore typically has a local cache so that it has to do a remote fetch as infrequently as possible.
The third issue is scalability. Suppose several hosts are able to address a common storage pool. A host that has a cache miss can go to the pool, which may include its own caching arrangement. If there are many hosts with lots of misses, the combined load could overwhelm the pool and cause queuing delays which would slow down the response back to the hosts. In some systems, each host has its own local cache. Such systems need to ensure that the local host caches remain consistent: they must always serve up the logically correct, current version of the data. The standard approach to solving this problem is for each host to coordinate with a central server to make sure that the version of data it has is up to date. This approach has the advantage of letting the hosts operate without communicating amongst themselves, but it generally does not let one host benefit from the data being in another host's cache. It is possible for the central server to keep track of what every host is caching, and redirect a request from one host to another, but this approach does not scale well.
In other of these systems, the hosts pool their local cache resources to create a larger, virtual cache. In such systems, the hosts communicate amongst themselves so that the hosts know which other host may have the data needed in its cache and ensure the consistency of the caches. Such communication can also have scalabilty issues, increasing the load on each server just to maintain cache consistency and a form of global location table so that needed data can be found. Further, it can make the performance of one host tied to the performance of other hosts. This interdependence can make performance troubleshooting very difficult.
What is needed is thus a system that improves the ability of a storage system to provide data proximity, that keeps needed data in a host's local cache whenever possible and, that provides scalability, that keeps the inter-host communication load to a minimum, that does not make a single central server a bottleneck, and yet that ensures data consistency so that each host always serves up the correct data. Ideally, such a system should enable these features even in the presence of different memory technologies.