Efficient transfer of information from one memory device to another is an old problem in the computer arts. The problem arises when there is a need to access information that is stored in some bulk or secondary memory device of the computer, e.g., a disc, and transfer that information to the working or main memory of the computer for processing. Typically, information stored in memory devices of the computer (like bulk or secondary storage) is transferred to main memory in blocks or pages.
Information can be accessed from the system by using one or more queries (query set). If all the information required by a given query is located in one page of memory, the requirements of the query access will be satisfied when that page is retrieved into main memory. However, if the information required by the query is located on more than one page, several pages (or possibly a very large number of pages) may have to be retrieved to satisfy the requirements of the query or query set.
Retrieving more pages, than necessary can cause the computer to operate more slowly and inefficiently because multiple extra pages (perhaps a large number of extra pages) of memory would be accessed to satisfy the query (or query set). Accordingly, system speed and performance can be improved by clustering together, on a page of secondary memory, a set of data objects that are often required by the same query (or query set) so that the number of page accesses required to satisfy the query (or query set) is kept to a minimum.
This problem is encountered in object oriented databases or persistent object systems. Object oriented systems use objects which contain "content" such as basic information-bearing constituents like text, image, voice or graphics. Objects may also contain logical or physical relationships to other objects as well as access restrictions. Each object is also associated with executable code although that is rarely stored along with the object. Because objects in many systems are typically much smaller than the size of a page of memory, the system can select a subset of objects to cluster on a given page in many ways. However, improper object clustering can cause excess page left, fences and the commensurate inefficient operation.
The prior art has recognized the importance of storing related objects together and devised various schemes to implement object clustering. The simplest schemes for object clustering allow the programmer to specify which objects should be placed together. The problem with this approach is that while the programmer may have a good idea of small sets of objects that are likely to be accessed together, the global problem of optimizing object access over a large set of objects and many queries is extremely complex. For these large complex global cases, strategies based on local (referring to a subset of the set of objects and queries) information are likely to be suboptimal, perhaps severely so.
A better approach in the prior art is to use the structure of the object classes to infer access patterns and assign/cluster objects on a page accordingly. Unfortunately, here again, this improved method relics upon local information (object references based on the structure of the class) and does not optimize the global set of objects and queries. These static techniques also suffer from the short-coming that access patterns may change as the database evolves and objects are created, deleted, changed in size, and updated to refer to different objects. Thus, even if one of these schemes is used for an initial set of objects and queries, some dynamic algorithm is required to periodically reorganize the database and ensure that good clustering is maintained.
One dynamic clustering heuristic available in the prior art, known to the inventors, is based on collecting reference counts on all objects over some period of time and then using a "greedy" algorithm to assign objects to pages. It is well known that greedy algorithms are not only non-optimal but yield poor performance in complex situations.
The clustering problem is inherently difficult to solve. Most likely, the cost in terms of computation time of solving the problem, i.e., finding an optimal way to cluster the objects of the pages of secondary memory, increases exponential with the number of objects. This exponential growth has a critical effect, since for each additional object, the cost of solving the problem increases by a multiplicative factor. For example, a problem involving 20 objects is 11 orders of magnitude more expensive than a problem involving 10 objects. More specifically, a clustering problem of twenty objects gives rise to approximately 10.sup.24 ways of assigning objects to pages.
Because the daunting number of possible ways to assign objects to pages for large numbers of objects and queries makes the cost of solving an instance of the clustering problem so expensive, one approach in the prior art is to use Linear or Integer Programming formulations. However, these methods become computationally infeasible for even relatively modest problems because of the large number of variables generated by using Linear Programming or Integer Programming techniques.