Block caching to flash memory devices (or “caches”) allows fast performance for a dataset that is larger than fits natively in the flash device. Some applications invalidate the cache on startup for a variety of reasons including to ensure there is no stale data in the cache in a high availability environment. This can lead to a “cold cache” that is empty on startup. Cache invalidation can be valuable because it provides a very easy, if not necessary, interface for applications to guarantee consistency among groups of cached nodes.
Initially, filling a “cold cache” with valid data is sometimes referred to as “warming the cache.” One conventional method of warming the cache includes application use, where each block accessed from the data store connected to the cache by the application is then stored at the cache until the cache is at full capacity. In some embodiments, data stored in the cache is not replaced or rewritten until the cache is at full capacity. A cache at full capacity is more efficient than an empty cache or a partially filled cache because reading from the cache is faster than reading the same data from the data store. Blocks are typically accessed on application terms, such that applications that have random input/output to the cache generally take a long time to warmup a large cache. A long warm up period for the cache may result in a less efficient cache due to requiring more accesses to the data store than if the cache is warmed faster.
Conventional approaches typically warm up a cold cache at run time, with the first reference of a block bringing the corresponding data into the cache. Certain applications may access data stored in the cache based on files associated with the applications, rather than blocks associated with the cache, such cache warming policies may result in significant random input/output (I/O). Warming the cache based on random I/O from the applications is a relatively slow type of access to spinning storage drives due to the mechanical movement required to access the random locations on the storage drive. Thus, warming the cache based on random I/O can take a long time, especially as caches continue to grow larger. The larger a cache is, the longer it takes to fill the cache with valid data. Sequential access to spinning storage drives requires much less mechanical movement and, therefore, much less time to access the specified locations.
Throughout the description, similar reference numbers may be used to identify similar elements.