The size of mass storage continues to grow at a phenomenal rate. At the same time, the data that is stored on mass storage devices also continues to grow. Many applications are designed to take advantage of this expanded storage and often install files that would have consumed hard drives of only a few years ago. Data centers also have found ways to use the increased storage capacity. Databases on the order of terabytes are common while databases of much larger sizes are also increasing in number. Efficiently and cost effectively differentiating and managing data according to value to the business has become a challenge for enterprises.
The speed improvement of computer microprocessor and memory has been exponential, roughly tracking the Moore's law—doubling every 18 months. Similarly, the growth of network bandwidth between storage devices and CPUs has relatively kept the same pace. Quickly moving large amounts of data between storage devices and applications continues to remain a challenge for both personal and enterprise applications.
Hard disk manufacturers have responded by developing faster hard drives and larger hard drive caches. However, due to the nature of electro-mechanical mechanisms and very limited amount of disk cache relative to the hard disk capacity, the hard drive's performance has only improved linearly. The performance gap between fast CPUs and slow hard drives has and continues to grow, making storing or retrieving data or code from a hard drive one of the most significant bottlenecks to increased system performance. Various forms of caching have been used that speed up the transfer of data and code to both local and remote storage. Traditional least recently used (LRU) cache replacement algorithms have some benefits, but are ineffective and inefficient in dealing with some common application data patterns, such as a large stream of sequential data (e.g. a large video stream, or simply loading a large database). In part, this is because the large stream of sequential data can “flood the cache” or invalidate and push out hot data previously residing in the cache. Segmented cache techniques have attempted to address this problem but still have many shortcomings such as overhead incurred by caching data this is only used once, latency when a host issues a request that causes a cache miss when the cache is full, and flushing problems.