This invention relates generally to query plan caching, and more particularly to query plan cache management in shared-nothing distributed data stores.
In query-based shared data stores, typical evaluation of a query involves parsing, rewriting, planning and then executing the query. For many queries, the parsing, rewriting and planning operations are the most costly, and consume a significant portion of the total run time of the query. Caching query plans allows a shared-nothing data store to skip these operations for plans which have already been generated the next time the queries are run, thereby reducing execution times and costs, and improving performance. Caching is particularly effective for queries involving repetitive operations on the same resources.
However, problems arises in a busy shared-nothing data store in insuring that only plans that are likely to remain valid are cached, and in insuring that the plan cache contains only valid plans. If a query plan involves transient objects that change or disappear, or if conditions at the time a query plan is re-executed are different from the conditions at the time the plan was generated, a runtime error will result when the plan is reused. The longer a plan is cached, the more likely it is to become invalid because of changes. There is no cost-effective way of easily determining which plans have become invalid and should be removed from cache. One previous approach to addressing this problem was to register all objects, and then track the objects so that when an object was removed or changed, a corresponding plan could be invalidated. However, this is costly and complex to implement, and tracking transient objects is expensive. This problem is even more challenging in a shared-nothing distributed data store environment where plans are cached in a distributed fashion, the caches on all nodes must remain synchronized, and all nodes must make the same decision upfront about caching a plan that may possibly become invalid. Presently, there is no simple and effective way to accomplish this.
There is a need for addressing the foregoing and other problems of plan cache management, and in particular, for strategically identifying in a shared-nothing distributed data store environment which plans have a higher probability of becoming invalid and should not be cached, and for determining which plans are likely to remain valid and should be cached to improve performance. It is to these ends that the present invention is directed.