A continuing problem in computer systems remains handling the growing amount of available information or data. The sheer amount of information being stored on disks or other storage media for databases in some form has been increasing dramatically. While files and disks were measured in thousands of bytes a few decades ago—at that time being millions of bytes (megabytes), followed by billions of bytes (gigabytes)—now databases of a million megabytes (terabytes) and even billions of megabytes are being created and employed in day-to-day activities.
With the costs of memory going down, considerably large caches can be configured on the desktop and server machines. In addition, in a world where hundreds of gigabytes of storage is the norm, the ability to work with most data in large caches can increase productivity and efficiency because the caches can be configured to retrieve data more quickly than the same data can be retrieved from many mass data stores. A cache is a collection of data that duplicates original value(s) stored elsewhere or computed earlier, where the cached data can be read from the cache in lieu of reading the original value(s). A cache is typically implemented where it is more efficient to read the cached data than to read the original value(s) so that use of the cache can increase the overall efficiency of computing systems.
In an effort to scale the size of caches in an organized manner, some caches are configured as distributed partitioned caches. A distributed cache is a cache that is distributed across one or more cache nodes. Typically, a distributed cache is distributed across one or more physical or virtual computing machines. A distributed partitioned cache is a cache that is partitioned across multiple cache nodes, where a primary location for each partition is on a single cache node. As used herein, a cache node refers to a storage process in a cache system. A cache node may be on a single machine or spread across multiple physical machines, and a single physical machine may include multiple storage nodes, such as where a single physical machine hosts multiple virtual machine processes. Thus, the distributed partitioned cache is spread over multiple storage processes, so that the entire set of primary data to be read from the cache is not stored on a single process, and typically is not stored on a single machine. As used herein, the term “primary” data indicates the data that is currently set up to be accessed in the cache, such as to be read from the cache, as opposed to secondary or replicated data that is currently being stored as a backup. The primary data may also be replicated from other data outside the data store. For example, in a distributed cache the primary data may be replicated from more authoritative data that is stored in long-term mass storage. The term “primary” is similarly used to refer to a primary region or partition, which is a region or partition currently set up to be accessed, as opposed to a replica of the primary region or partition. The term “primary” can also be used to refer to a primary cache node, which is a cache node that stores the primary data, such as a primary region. Note, however, that a cache node can be a primary node for one set of cache data and a secondary node for another set of cache data. A distributed partitioned cache system is a system that is configured to implement such distributed partitioned caches.
The data manager component in a distributed cache is a component that handles the storage of the data.