Much of the data transferred across conventional networks is redundant. In particular, various data items are often delivered to multiple destinations, sometimes simultaneously, and sometimes at different times. Examples of data items that experience such redundancy are media files (movies, videos, audio), executable files, and shared data files. Additionally, sometimes two distinct data items share common subcomponents. For instance, two large videos may contain a common clip, or have a common underlying music theme; documents may quote text from an identical source; virtual machines may have significant common memory components (e.g., operating system kernels), or executables may share similar configuration files. The sharing may also be purely coincidental, e.g., a contiguous chunk of 0's, shared code libraries within distinct executables, etc.
Hence, the data desired by client, while originating and made available by an original source, can be available in full or in part at other client, locations. These other clients might be explicitly configured to hold the content (e.g., mirror servers), or they might have the content due to their own request and retrieval.
The original Internet was not designed to facilitate retrieval of data from any or all locations at which it was available, but rather from a known location. Various technologies have enabled the extension to multiple source locations (e.g., DNS redirection, Peer-to-peer [P2P] networking). What, is needed is a content-centric networking, were the desired data, in full or in part, may be retrieved from any or every source at which that data is available.
The advantages of virtual machine technology have become widely recognized. Among these advantages is the ability to run multiple virtual machines on a single host platform. This makes better use of the capacity of the hardware, while still ensuring that each user enjoys the features of a “complete” computer. Depending on how it is implemented, virtualization can also provide greater security, since the virtualization can isolate potentially unstable or unsafe software so that it cannot adversely affect the hardware state or system files required for running the physical (as opposed to virtual) hardware.
As is well known in the field of computer science, a virtual machine (VM) is a software abstraction, or “virtualization,” of an actual physical computer system.
A virtual machine (VM), which in this system is a “guest,” may be installed on a “host platform,” or simply “host,” which includes system hardware and one or more layers or co-resident components comprising system-level software, such as an operating system (OS) or similar kernel, virtual machine manager (VMMs), or some combination of these. As software, the code defining VM will ultimately execute on the actual system hardware.
As in almost all computers, this system hardware will typically include one or more CPUs, some form of memory (volatile and/or non-volatile), one or more storage devices such as one or more disks, and one or more devices, which may be integral or separate and removable. In many existing virtualized systems, the hardware processor (s) are the same as in a non-virtualized computer with the same platform, for example, the Intel x86 platform. Because of the advantages of virtualization, however, some hardware vendors have proposed, developed, or released processors that include specific hardware support for virtualization.
Each VM will typically mimic the general structure of a physical computer and, as such, will usually have both virtual system hardware and guest system software. The virtual system hardware typically includes at least one virtual CPU, virtual memory, at least one storage device such as a virtual disk, and one or more virtual devices. All of the virtual hardware components of a VM may be implemented in software to emulate corresponding physical components. The guest system software typically includes a guest operating system (OS) and drivers as needed, for example, for the various virtual devices.
A significant problem faced by the designers of present day computer systems employing modern processors operating at high clock rates is to provide a large amount of memory at reasonable cost while achieving a high system performance. Particularly, modern processors operate at such high clock rates that a memory system oftentimes cannot provide code and/or data at these rates thereby retarding system performance. And while this problem is acute when it involves relatively quick memory, i.e., Dynamic Random Access Memory (DRAM), the problem is further exacerbated when it involves slower memory, i.e., disk drives or systems, which are often essential in computer systems, employing virtual operating systems.
A cost effective, prior art solution to this problem of coupling a computer system to a disk system is to provide a cache for disk within memory. A cache is a relatively small-sized but high-speed memory placed between the computer and the larger-sized but slower disk storage.
The operating principle of the disk cache is the same as that of a central processing unit (CPU) cache. The first time a program or data location is addressed, it must be accessed from the lower-speed disk. Subsequent accesses to the same code or data are then done via the faster cache, thereby minimizing its access time and enhancing overall system performance.
A computer system having a central processor (CPU), main system memory, and host adapter may all be interconnected by system bus. The host adapter serves as an interface between the computer system and an input/output device, i.e., a disk or array of disks, typically through the use of a standard logical/electrical protocol, e.g., Small Computer System Interface (SCSI), or NFS.
In this prior art system, the computer system is interfaced via NFS to a disk array system having one or more magnetic disks organized as an array, through an array control unit, having an array controller and a cache memory system.
In such a prior art system, the processor issues commands (READ, WRITE, etc.) to the disk array system. For example, in the case of a READ command, if the information requested is immediately in the disk cache, the information requested is immediately forwarded to the processor by the array controller over the NFS bus to the host adapter. If the information is not in the cache, the controller retrieves the information from the disk array, loads it into the cache, and forwards it to the processor.
Since all disk cache memory systems are of limited capacity, the disk cache often fills and some of its contents have to be changed as new code/data are accessed from the slower disk storage. A primary objective for a designer of a system utilizing disk cache, therefore, is to have the code and data most likely to be needed at a given time available in the disk cache—accesses can then use the fast cache rather than the slower disk storage. When accessing of the cache allows retrieval of necessary data from the disk cache, it is called a “hit”, and when retrieval of necessary data cannot be performed, it is called a “miss”. The average hit times per times is called a “hit ratio”.
One of the most important decisions facing the designer of a disk cache, therefore, is the choice of the disk cache replacement strategy. The replacement strategy determines which disk blocks are removed from the disk cache at a given time thereby making room for newer, additional disk blocks to occupy the limited space within the disk cache. The choice of a replacement strategy must be done carefully, because the wrong choice can lead to poor performance of a disk system, thereby negatively impacting an overall computer system performance.
A number of different methods to manage disk cache replacement has been used in the art, for example, J. T. Robinson and M. V. Devarakonda, “Data Cache Management. Using Frequency-Based Replacement”, Performance Evaluation Review, Vol. 18, No. 1, May 1990.
Perhaps the simplest replacement strategy employed in the art is the first-in, first-out (FIFO) strategy. This strategy replaces the resident disk block that has spent the longest time in the cache memory. Whenever a block is to be evicted from the disk cache, the oldest block is identified and removed from the cache.
In order to implement the FIFO block-replacement strategy, a cache manager must keep track of a relative order of the loading of the blocks into the disk cache. One prior art method for accomplishing this task is to maintain a FIFO queue of blocks. With such a queue, the “oldest” block is always removed, i.e., the blocks leave the queue in the same order that they entered it.
A serious drawback arises through the use of the FIFO strategy however. By failing to take into account the pattern of usage of a given block, FIFO tends to throw away frequently used blocks because they naturally tend to stay longer in the disk cache. Although relatively easy to implement, FIFO is not a first choice replacement strategy for disk cache designers.
As suggested by its name, the least-recently-used (LRU) replacement strategy replaces a least-recently-used resident block. Generally speaking, the LRU strategy performs better than FIFO. The reason is that LRU takes into account the patterns of program behavior by assuming that the block used in the most distant past is least likely to be referenced in the near future. When employed as a disk cache replacement strategy, the LRU strategy does not result in the replacement of a block immediately before the block is referenced again, which can be a common occurrence in systems employing the FIFO strategy.
Unfortunately, implementation of the LRU strategy may impose much more overhead on the disk cache system than can be reasonably handled by software alone. One possible implementation is to record the usage of blocks by means of a structure similar to a stack. Whenever a resident block is referenced, it is retrieved from the stack and placed at its top. Conversely, whenever a block eviction is in order, the block at the bottom of the stack is removed from the disk cache. A similar effect may be achieved by putting the blocks into a circular list and including a “recently used” bit for each block. The latter is set whenever the block is accessed. When it is time to remove a block, a pointer moves along the circular list, resetting all “recently used” bits until finding a block that has not been used since the last time the pointer reached this part of the circular list.
Maintenance of the block-referencing structure requires its updating for each and every block reference. In other words, the overhead of searching the stack, moving the referenced block to the top, and updating the rest of the stack accordingly must be added to all disk references. Similarly, the circular list must be maintained, for each block accessed. As a result, cache designers oftentimes implement a pure LRU replacement strategy with extensive and dedicated hardware support for the described stack operations.
A more efficient software implementation and management of a disk cache system is known. It is implemented in a storage subsystem having, preferably, an array of selectively accessible direct access storage devices (disks), a processor, program memory, cache memory, and non-volatile memory and is responsive to commands received, from at least one external source.
In response to the commands received from the external source, i.e., WRITE, READ, the storage system transfers data, preferably organized as blocks, to/from the direct access devices to/from the external source, as indicated. In order to speed access to the data, the blocks are held in an intermediary cache—when possible. Blocks which are the subject of a READ request and present in the cache, are transferred directly from the cache to the external source. Conversely, Blocks which are the subject of a READ request and are not present in the cache, are first transferred from the direct access devices to the cache. Finally, blocks which are the subject of a WRITE request are stored in the cache, and subsequently flushed to the direct access devices at a convenient time.
Viewed from one aspect, that prior art is directed to a method and apparatus for determining whether a particular block (i.e., an atomic unit of data) which is the subject of a READ request is present in the cache at a particular time. The method employs a hashing function which takes as its input a block number and outputs a hash index into a has table of pointers. Each pointer in the hash table points to a list of digests, with each digest having a bit map wherein the bits contained in the map identify whether a particular block of data is contained within the cache. Upon entry into the hash table, the digests are searched. If no digest is found that contains the particular block, a cache “miss” occurs, at which point in time an available space is made within the cache to hold the block and the block is subsequently retrieved from the direct access device and stored within the cache. Conversely, if a digest is encountered during the search of linked digests having a bitmap confirming that the particular block is a valid block, then a cache “hit” occurs and the block is transferred from the cache.
The heap is modified in order to ensure that the most likely candidate (least-frequently-used/oldest) is always at the root of the heap. This is accomplished by performing a local reorganization of the heap every time a cache line is utilized. When a cache line is about to be accessed, i.e., data blocks will be read from or written into the cache line, the cache line is not removed from the heap. Instead, the cache line is marked as being in a busy state thereby preserving its position within the heap and ensuring that the data blocks within the cache line cannot be accessed by another READ or WRITE process simultaneously.
Upon completion of the access, the cache line is freed from its busy state and a frequency-of-use indicator and a timestamp—both associated with the cache line, are updated to reflect this access. Subsequently, a local reorganization (reheap) of the heap takes place beginning at a current location of the cache line in the heap. Upon completion of the reheap operation, the most likely candidate for replacement occupies the root of the heap.
An improved method of retrieving data over a network from any or every source of that data would be beneficial.