Computers have various types of storage devices with different performances. For example, computers have a small, fast memory (such as a random access memory (RAM) or the like), and a large, slow memory (such as a hard disk drive (HDD) or the like). In the computers, frequently accessed data are stored in a fast-access storage device as far as possible so as to allow fast access. This technique is called cache. In the storage area of such a fast-access storage device, the area for temporarily holding the data read from a slow storage device is called a cache area.
When accessing data, if the data is stored in the cache area, an access is made to the cache area. When the data to be accessed is found in the cache area, it is called a cache hit. On the other hand, if the data to be accessed is not stored in the cache area, the data is read from the slow storage device. When the data to be accessed is not found in the cache area, it is called a cache miss.
In many cases, the capacity of the fast storage device is small, and hence the capacity of the cache area is limited. Therefore, data stored in the cache vary depending on the data that are accessed. When replacing data stored in the cache area, determining which data to remove is important in improving the cache hit rate. That is, the strategy (cache algorithm) for determining which data to remove greatly affects the computer performance.
As one of cache algorithms, there is an algorithm called least recently used (LRU), for example. This algorithm discards, from the cache area, the data that has not been used for the longest period of time since the last use.
In some situations where computers are used, it is possible to predict to some extent which data will be accessed next, based on the data that was called before access to the data. For example, in the case of browsing information using social networking services (SNSs) or World Wide Web (WWW), it is possible to predict that any of pages linked from the page that is currently being browsed will be browsed next. Further, by recording the percentage of each linked page being previously selected, it is possible to predict the probability of each linked page being browsed next. In this way, when it is possible to predict to some extent which page will be accessed next, the cache hit rate may be improved by effectively using the probability of each data item being accessed next.
For example, as a technique related to a cache algorithm using the probability of each data item being read, there has been disclosed a technique using a Markov chain. According to this technique, the probability of each data item being used is calculated assuming that the locality of a program on a storage device is based on a Markov chain. Thus, a data item with a low probability of being used is selected as a data item to be replaced.
Please see, for example, Japanese Laid-open Patent Publication No. 2-219147
However, the conventional cache algorithm using the probability of each data item being accessed next takes into consideration only the probability in the next access, and therefore is sometimes not sufficiently effective to improve the cache hit rate. For example, there may be a case where it is obvious that, when a data item is accessed, a specific data item will not be accessed in the following access but will be accessed with a high probability in the second following access. In this case, if the conventional technique is used, since the probability that the specific data item which will be accessed in the second following access (access after the next access) with a high probability will be read in the next access is 0, the specific data item might be removed from the cache area. Thus, when an attempt to access the specific data item is made in the second following access, a cache miss occurs, which results in a reduced cache hit rate.
Thus, the conventional cache algorithm using the probability of each data item being read next is not sufficiently effective to improve the cache hit rate.