This invention relates to data caching. A multi-tier environment typically includes a data server (for example, a relational database server, a multi-dimensional database server, or a file server), an application server, and consumers (such as end users or processes). In some implementations, the multi-tier environment can also include an additional tier for specialized functions. In other implementations, all of these functions may reside on a single tier. In a typical use scenario, a consumer issues requests to an application, which in turn issues requests to a data server. There can be many different types of requests. For example, a relational database application issues SQL requests, a web application can issue Extensible Markup Language (XML) or XQuery requests, a multi-dimensional database application, such as an Online Analytical Processing (OLAP) application, can issue requests in the form of Multi-Dimensional Expressions (MDX), and so on.
The application performance can be improved by pre-fetching data from the database server and storing the pre-fetched data in non-persistent memory, typically referred to as a data cache, either on the data server or on the application server. Although the memory capacity of servers continues to increase, disk storage capacity is increasing at an even faster rate, resulting in a declining ratio of memory to disk storage and necessitating a more efficient way to populate the data cache.
In some implementations, it is possible to populate the data cache either before a consumer begins requesting data, or on demand per each request. Although data caching can significantly improve performance, it does have some drawbacks. For example, the data cache may not be able to hold all of the data in memory, especially for applications retrieving large amounts of data. A second drawback is that when the data cache is populated on demand, the first request for data pays a high price in response time. A third drawback is that when the data cache is pre-populated, the total amount of data that could be cached may be too large to fit in the available memory cache. A fourth drawback is that when the data cache is populated on demand, some data must be removed as the data cache fills up. This cleanup process typically uses a simple first-in first-out or least-recently-used policy to decide which data should be removed. However, this does not ensure that the most important data is kept in the data cache. Thus, more efficient methods are needed to populate the data cache to ensure that necessary data is already available when requested by consumers.