1. Field
This disclosure relates generally to I/O caching, and more particularly to filtering I/O data to be cached based on data generation and consumption.
2. Description of the Related Art
Caching has long been used in storage environments to enhance the performance of slower storage devices, such as disk drives. In caching, a smaller and faster storage medium is utilized to temporarily store and retrieve frequently used data, while the larger and typically slower mass storage medium is used for long term storage of data. One caching methodology is write-back caching, wherein data written to a disk is first stored in a cache and later written to the mass storage device, typically when the amount of data in cache reaches some threshold value or when time permits.
FIG. 1 is a block diagram showing an exemplary prior art computer system 100 having caching capability. The exemplary prior art computer system 100 includes a central processing unit (CPU) 102 in communication with system memory 104, a caching device 106, a local target storage device 108, and a network interface card 110. In addition, loaded into system memory 104 is caching software 112, which functions to facilitate caching functionality on the computer system 100.
As mentioned previously, the caching device 106 generally comprises a smaller, faster access storage than that used for the local target storage device 108. Because of the enhanced speed of the caching device 106, reads and writes directed to the caching device 106 are processed much faster than using the target storage device 108. Caching takes advantage of these differences by sending selected write requests to the caching device 106 before later transferring the data to the local target storage device 108.
For example, when the CPU 102 processes a write request to write data to the target storage device 108, the caching software 112 intercepts the write request and writes the data to the caching device 106 instead. When the CPU 102 processes a read request, the caching software 112 again intercepts the read request and determines whether the data is stored on the caching device 106. If the data is currently stored on the caching device 106, the caching software 112 reads the data from the caching device 106, otherwise the data is read from the local target storage device 108.
Data is selected for caching based on many different factors. For example, prior art approaches include recency, frequency, block sizes, and file types. However, these approaches assume that the data is located and used locally. If the data is not located and used locally, data selection policies based on recency, frequency, block sizes, and file types are far less effective.
For example, in FIG. 1, the computer system 100 is in communication with a remote storage device 114 via the network interface card 110. The remote storage device 114 can be connected to the computer system 100, for example, via the Internet, or other network. In such a remote configuration, data generally takes longer to travel from the remote storage device 114 to the CPU 102 than from the local target storage device 108 to the CPU 102. Thus, in this instance the bottleneck is the connection to the remote storage device 114 and not the throughput of the local target storage device 108.
In view of the foregoing, there is a need for systems and methods for filtering I/O data to be cached to exclude caching data that will not effectively help the user. The systems and methods should be able to determine when caching will be effective and avoid inefficiently caching data.