In the modern electronic world, information is propagated across many mediums. For instance, individuals can receive/send information using personal computers or even cell phones. Many times, information is sourced in one or more backend databases that store and propagate the information.
With the increasing demand of information bandwidth comes the need to improve the latency in receiving/sending information from/to the databases. One way to improve latency is by improving the memory access time for accessing data in the database. This can be accomplished through a database cache system.
CbDBC provides one approach to implementing a database cache. In general, a database cache system keeps subsets of data (e.g., from the backend) in a cache instance. The cache instances are located close to applications to avoid high latency if the backend is accessed without caching. One aspect of database caching involves providing for the execution of descriptive queries (e.g., SQL for CbDBC) at the cache without having to access the backend (e.g., the database system holding the complete data).
In order to achieve this, CbDBC uses cache groups to determine which values have to be stored value complete at the cache. The cache groups normally have a combination of referential cache constraints (RCC), filling columns (FC), and cache tables.
With the help of defined constraints (e.g., FCs and RCCs) within cache groups, the cache can easily determine which predicate extensions (e.g., all records necessary to evaluate the predicate) are completely available at the cache. This task is normally referred to as probing.
A cache group normally includes a root table that has exactly one cache key (CK) and a set of member tables that can be reached from the root table via RCCs. Using CKs can present some problems, however, if more than one CK is declared within a root table or within the cache group. FCs have been used to address this issue. With the help of FCs, it is possible to declare more than one start point for loading (e.g., many FCs) within a root table and, in addition, more cache groups can be federated.
FIG. 1 shows some elementary cache group structures. Cache tables are denoted by capital letters (e.g., S, T, and U), and columns are denoted by lowercase letters (e.g., a, b, and c). Each cache table S belongs to one backend tables SB and is able to store records of SB. The arrows are RCCs and the boxes with black triangles denote FCs.
FIG. 1 shows a design having cache group structures that may be considered relatively difficult to maintain. That is, cache group 4 (CG4) of FIG. 1 shows a heterogeneous cycle where more than one column in a cache table belongs to the cycle. In addition, there are some situations where one design may be deemed preferable to another. Thus, a designer may have to decide which cache groups to build and how to build it.
Currently, the design and creation of cache groups is a manual task that requires a database administrator to build the groups. There is no known system that allows an automatic creation of cache groups to react immediately and dynamically on workload shifts. Instead, the shifts need to be identified manually and, after that, the administrator has to decide which predicates (e.g., cache groups/cache group federations) need to be adapted, which ones need to be deleted, and which new cache groups should be created.
There are currently no design rules for automatically building cache groups. There is only the basic knowledge that a cache group is built for predicates that are often used or, at an even higher level, the defined cache groups must be beneficial for caching. Consequently, the time spent maintaining the cache groups offsets the benefits gained through the improved latency of the cache groups.
Thus, it will be appreciated that there is a need for techniques that automatically and dynamically change cache group structures based on predefined design rules.
In certain example embodiments, a method for automatic and dynamic adaptation of cache groups in a database system having one or more processors comprises analyzing a database query and determining if a set of predicates in a predicate pattern are suitable for inclusion in one or more cache groups, the one or more cache groups having one or more cache tables, mapping value-based predicates in the predicate pattern to a filling column in the one or more cache tables, and mapping equi-join predicates in the predicate pattern to a referential cache constraint in the one or more cache tables, wherein new cache groups are created for predicate patterns occurring more frequently and existing cache groups are deleted if the frequency of the predicate pattern falls below a predetermined threshold value.
In a non-limiting, example implementation the frequency relates to a logical time in which the predicate patterns are occurring.
In another non-limiting, example implementation cache groups are federated when the cache groups belong to a same predicate pattern.
In yet another non-limiting, example implementation cache groups are prohibited from further federation when the cache groups belong to a different predicate pattern.
In certain example embodiments, there is provided a method for automatic and dynamic adaptation of cache groups in a database system having one or more processors. The method comprises creating an anchored table set configured to hold one or more cache tables, collecting value-based predicates in a database query, creating a cache table for each value-based predicate and marking a filling column in each respective cache table, adding each cache table to the anchored table set, determining addable referential cache constraints between each cache table in the anchored table set, determining costs for each addable referential cache constraint, adding the referential cache constraint based on the determined cost of the respective referential cache constraint, checking the cache tables in the anchored table set for heterogeneous cycles between tables, and modifying referential cache constraints between cache tables when a heterogeneous cycle exists between cache tables, wherein the determining addable referential cache constraints, the determining costs for each addable referential cache constraint, the adding the referential cache constraint, the checking the cache tables in the anchored table set, and the modifying the referential cache constraints is repeated until all possible addable referential cache constraints have been exhausted.
In certain example embodiments, there is provided a non-transitory computer-readable storage medium having computer readable code embodied therein and capable of being stored in a memory as computer program instructions that, when executed by a computer having one or more processors, causes the computer to execute the methods described in the preceding paragraphs.
In certain example embodiments, there is provided a database System having a backend database having a memory and one or more processors and storing backend data, and one or more cache instances having a memory and one or more processors and operatively communicating with the backend database. The one or more processors in the one or more cache instances are configured to create an anchored table set configured to hold one or more cache tables, collect value-based predicates in a database query, create a cache table for each value-based predicate and marking a filling column in each respective cache table, add each cache table to the anchored table set, determine addable referential cache constraints between each cache table in the anchored table set, determine costs for each addable referential cache constraint, add the referential cache constraint based on the determined cost of the respective referential cache constraint, check the cache tables in the anchored table set for heterogeneous cycles between tables, and modify referential cache constraints between cache tables when a heterogeneous cycle exists between cache tables, wherein the determining addable referential cache constraints, the determining costs for each addable referential cache constraint, the adding the referential cache constraint, the checking the cache tables in the anchored table set, and the modifying the referential cache constraints is repeated until all possible addable referential cache constraints have been exhausted.
In a non-limiting, example implementation a referential cache constraint having a minimal cost amongst each addable referential cache constraint is determined.
In another non-limiting, example implementation a cost for a reverse of each addable referential cache constraint is calculated.
In yet another non-limiting, example implementation a switching cost is calculated for each addable referential cache constraint and its corresponding cost for the reverse of each addable referential cache constraint by subtracting the cost of each addable referential cache constraint from its corresponding cost for the reverse of each addable referential cache constraint.
In another non-limiting, example implementation the switching cost for each addable referential cache constraint is added to a priority list where the smaller switching costs have higher priority.
These features, aspects, advantages, and example embodiments may be used separately and/or applied in various combinations to achieve yet further embodiments of this invention.