1. Field of the Invention
The present invention relates to a computer system, and deals more particularly with a method, system, and computer program product for using a two-tiered cache for storing and accessing hierarchical data, in order to minimize memory usage and improve performance.
2. Description of the Related Art
In the distributed client/server environment, there is a need to disseminate information within a company or corporation that is required by users for everyday operation. This information typically includes such things as accounting information, computer configuration information, etc., and is typically available on one or more servers which are accessed by the users. Since companies, particularly large corporations, are often organized hierarchically (such as divisions, departments, etc.), each of these hierarchical levels may specify information pertinent to that level, where each lower level of hierarchical information augments or overrides the level(s) above it. In particular, this hierarchical approach to specifying information is often used for computer configuration information. In this scenario, each user of the company systems accesses the information provided by the higher levels of the hierarchy, for example to use when connecting to the company network or when installing software on his computer. The user at this point occupies the lowest level of the hierarchy, and may customize the information obtained from the hierarchy above him. These customized values are often referred to as xe2x80x9cuser preferencesxe2x80x9d and may be unique to any user within the hierarchy.
FIG. 1 is an example of this type of hierarchical structure. This structure represents hierarchically stored information, which typically mirrors the hierarchy of the organization. Company X 100 represents information stored for the corporate or highest level of the company. Any information defined at this level typically pertains to all elements of the hierarchy throughout the company, from organizations to individual users, because corporate settings tend to be broad in scope and define boundaries for the subordinate levels to follow. In this example, there is a subordinate level in Company X 100 which represents the development organization 105 and the sales organization 110. At this level, each subordinate organization uses the information from Company X 100 and may specify additional information to be used by its subordinate levels. Either or both of the development and sales organizations may also make changes (i.e. augmenting or overriding defined values) to the original information from the Company X level. This changed information is then made available to the subordinates of the level at which the change was made.
Moving down the hierarchy to the next level under development 105, we have two more organizations or possibly groups (such as a department of employees): (1) the user interface group 115 and (2) the API group 120. Both of these groups will inherit the information provided by the higher levels in this hierarchy. These groups also may add or modify information to further refine the data from the higher levels of the hierarchy, where these changes are then made available to the subordinate levels of the hierarchy. In this example, the subordinates to the user interface group 115 are the users (140, 145) of the company resources. Here, for example, Sue 140 inherits all the information available from following a path from the node 140 representing Sue up through and including all of the higher levels in the hierarchy.
The information provided to Sue 140 may include values for things such as system configuration attributes, including the colors displayed on her screen and the size of the windows on her display. Sue may choose to override these values (or any values provided by a higher level) in order to customize her display to her own preferences. In doing so, Sue has now created her own customized set of values representing a coalesced version of all the higher level values from the highest level 100 down to and including the values Sue 140 provides. This customized set of information is typically stored in a database or other repository on a server for access by the user any time he needs to use the information. (It should be noted that overriding values defined at a higher level of the hierarchy may in some cases be limited to a subset of all the defined values, since some higher level values may be mandatory.)
When a user of a client machine requests data that is stored in a hierarchical structure (such as the example for FIG. 1), what the user wants in response is the xe2x80x9ccoalescedxe2x80x9d set of data. As used herein, a coalesced set of data refers to gathering all the data in the hierarchy, and merging data together such that the value for any given attribute is set by the lowest level in the hierarchy which has that attribute defined. From the user""s point of view, he receives a complete set of data, which is really a composite of the actual data from the user level and all levels above it in the hierarchical chain.
These complex types of stored data may impact overall performance of the server when information is retrieved since the coalescing of the stored data may involve a combination of several memory accesses as well as disk retrievals to collect, and then coalesce, all the necessary information before responding to a user request. For example, if Sue requests information for the hierarchy of FIG. 1, separate disk accesses (which may further require round trips through a distributed network) may be needed to retrieve the information for Company X 100, Development 105, User Interface 115, and finally for Sue 140.
While only a simple hierarchy is shown in FIG. 1, many more levels (with many nodes at each level) may exist in an organization or corporation. Performing multiple disk accesses to retrieve information for each level is a computationally expensive operation. In addition, having to continually recalculate the coalesced image of the data when changes occur is compute intensive. Conversely, caching the complete set of coalesced data for every user is very storage or memory intensive and requires a complete recoalescence any time data represented in the coalescence is changed.
Accordingly, what is needed is a technique that avoids the performance penalty of this continual recalculation, and avoids the storage penalty of storing large amounts of coalesced data.
An object of the present invention is to provide a technique which avoids continual recalculation for coalescing complex hierarchical data and minimizes the impact on performance and memory consumption in a client/server environment.
Another object of the present invention is to provide this technique using a two-tiered cache.
Yet another object of the present invention is to provide this technique in a manner that allows a server to determine if a set of coalesced cached values is out-of-date with the data store it represents.
Other objects and advantages of the present invention will be set forth in part in the description and in the drawings which follow and, in part, will be obvious from the description or may be learned by practice of the invention.
To achieve the foregoing objects, and in accordance with the purpose of the invention as broadly described herein, the present invention provides a method, system, and computer program product for use in a computing environment for using a two-tiered cache for storing and accessing hierarchical data. This technique comprises: providing a hierarchical structure of data comprising a top-level node, one or more intermediate levels having one or more intermediate-level nodes, and one or more user nodes, wherein each of the user nodes is a child node of the top-level node or one of the intermediate-level nodes, wherein the hierarchical structure is stored in a data repository accessible in the computing environment and wherein each of the nodes has a corresponding last update timestamp stored in the repository, the last update timestamp for the user nodes and the top-level node representing a last update to one or more data values of the node and the last update timestamp for the intermediate nodes representing an update of data values of the intermediate node or a parent of the intermediate node; creating coalesced data images (CDIs) for each of the top-level or intermediate-level nodes which is a group node, wherein a particular node is one of the group nodes when the particular node has one or more of the user nodes as a child, and wherein the CDI for the particular node comprises a coalescence of data values for the particular node, the top-level node, and all of the intermediate-level nodes in a hierarchical path from the particular node to the top-level node; storing the created CDIs in a central data cache along with a CDI timestamp for each of the stored CDIs wherein the CDI timestamp for each of the CDIs is set to the corresponding last update timestamp for the corresponding node; and storing user data for each of one or more users in a client cache for the user along with a client cache timestamp, wherein: each of the users is associated with a selected one of the user nodes; the client cache timestamp is set to the corresponding last update timestamp for the corresponding user node; and the stored user data is uncoalesced.
This technique may further comprise updating one or more of the data values for a selected node, where this updating further comprises: applying an update to the data values in the repository; updating the last update timestamp corresponding to the selected node; determining whether the selected node is one of the group nodes; and propagating the updated timestamp to each of the group nodes subordinate to the selected group node in the hierarchical structure when the determination has a positive result.
Preferably, this technique further comprises retrieving a coalesced result in response to a request from a particular one of the users, wherein the request may specify a refreshed result or an unrefreshed result.
Retrieving a refreshed result preferably further comprises: retrieving the user data for the particular user; retrieving the CDI for the group node of which the user node is one of the children; and merging the retrieved user data with the CDI of the parent node. Retrieving the user data preferably further comprises: retrieving the user data from the client cache when (1) the client cache for the particular user exists and (2) the client cache timestamp for the particular user""s client cache is not older than the last update timestamp corresponding to the particular user""s user node in the repository; and populating the particular user""s client cache otherwise. This populating further comprises: retrieving the user data from the particular user""s user node in the repository; storing the retrieved user data in the client cache for the user; and setting the client cache timestamp for the user to the last update timestamp for the corresponding user node. Retrieving the CDI for the group node preferably comprises: retrieving the CDI from the central data cache when (1) the CDI for the group node exists and (2) the CDI timestamp for the CDI is not older than the last update timestamp corresponding to the group node in the repository; and creating the CDI otherwise. Creating the CDI further comprises: retrieving the data values from the repository for the group node, the top-level node, and all of the intermediate-level nodes in the hierarchical path from the group node to the top-level node; coalescing the retrieved data values; storing the coalesced data values as the CDI for the group node in the central data cache; and setting the CDI timestamp for the CDI to the last update timestamp for the corresponding group node. This creating may further comprise repeating this process for each of the group nodes above this group node in the hierarchical path until reaching a first of the group nodes for which the CDI timestamp for the CDI of this first group node is not older than the last update timestamp corresponding to the group node in the repository.
In one aspect, the retrieving of user data from the client cache determines whether the client cache timestamp for the particular user""s client cache is different from the last update timestamp corresponding to the particular user""s user node in the repository rather than whether the client cache timestamp is not older than the last update timestamp, and the retrieving of the CDI from the central data cache determines whether the CDI timestamp for the CDI is different from the last update timestamp corresponding to the group node in the repository rather than whether the CDI timestamp is not older than the last update timestamp.
Retrieving an unrefreshed result preferably further comprises: retrieving the user data for the particular user; retrieving the CDI for the group node of which the user node is one of the children; and merging the retrieved user data with a parent CDI associated with a parent node of the user node. Retrieving the user data further comprises: retrieving the user data from the client cache when the client cache for the particular user exists; and populating the particular user""s client cache otherwise. This populating further comprises: retrieving the user data from the particular user""s user node in the repository; storing the retrieved user data in the client cache for the user; and setting the client cache timestamp for the user to the last update timestamp for the corresponding user node. Retrieving the CDI further comprises: retrieving the CDI from the central data cache when the CDI for the group node exists; and creating the CDI otherwise. Creating the CDI further comprises: retrieving the data values from the repository for the group node, the top-level node, and all of the intermediate-level nodes in the hierarchical path from the group node to the top-level node; coalescing the retrieved data values; storing the coalesced data values as the CDI for the group node in the central data cache; and setting the CDI timestamp for the CDI to the last update timestamp for the corresponding group node.
The present invention will now be described with reference to the following drawings, in which like reference numbers denote the same element throughout.