Database systems typically store database objects (e.g., tables, indexes, etc.) on disk, and load data items from those database objects into volatile memory on an as-needed basis. Once loaded into volatile memory, the data items may remain cached in volatile memory so that subsequent accesses to the same data items will not incur the overhead of accessing a disk. Those data items may be replaced in cache, for example, to make room in volatile memory to store other data items that have been requested.
Rather than load individual data items on a per-item basis, entire database objects, or portions thereof, may be loaded into volatile memory. Various approaches for loading entire database objects, or selected portions thereof, into volatile memory to speed up query processing are described in U.S. patent application Ser. No. 14/377,179, entitled “Mirroring, In Memory, Data From Disk To Improve Query Performance,” filed Jul. 21, 2014, referred to herein as the “Mirroring” application, the contents of which are incorporated herein in its entirety.
According to the approaches described in the Mirroring application, copies of data objects, or portions thereof, are cached in volatile memory. The format of the pre-loaded cached copy of the objects may be different than the on-disk format. For example, the in-memory version of an object may be in a column-major format, while the on-disk version may be stored in a row-major format. The in-memory version of the object (or selected portions thereof), is referred to as an In-Memory Compression Unit (IMCU) because the data contained therein is often compressed.
A clustered database system comprises a plurality of nodes. When a database object is to be pre-loaded into volatile memory, a copy of the database object may be loaded into the volatile memory of a single node in the cluster. To gain the benefit of having pre-loaded the object, work related to the object may be shipped to the node that pre-loaded the object (the “host” node). For example, assume that a cluster has five nodes N1, N2, N3, N4 and N5. N1 may be the designated host for a table T1. As the host of table T1, N1 loads a copy of T1 into its volatile memory. If any of nodes N2, N3, N4 or N5 receives a request to access data from T1, the task of accessing that data may be sent to node N1.
Rather than have a single node pre-load an entire database object, the database object may be divided into “chunks,” and the chunks may be distributed across the nodes in the cluster. For example, table T1 may be divided into chunks C1, C2, C3, C4 and C5 that are assigned to be hosted by nodes N1, N2, N3, N4 and N5, respectively. In a scenario in which a database object is divided into chunks that are distributed among nodes, each node loads its assigned chunk(s) of the database object into its respective volatile memory in response to a load-triggering event. Work involving any given chunk of T1 can be shipped to the node that has been assigned to host the chunk.
Various approaches for making hosting assignments are described in U.S. patent application Ser. No. 14/565,906, entitled “Distribution Of An Object In Volatile Memory Across A Multi-Node Cluster,” filed Dec. 10, 2014, referred to herein as the “Distribution” application, the contents of which are incorporated herein in its entirety. According to the approaches described in the Distribution application, all of the nodes in a cluster are treated as candidates for hosting chunks, to maximize utilization of the cluster's resources. Thus, hosting assignments may be made using a distribution method such as a hash function that hashes chunk identifiers to N buckets, where N is the total number of nodes in the cluster.
While spreading chunk-hosting assignments throughout all nodes in a cluster helps with load balancing and parallelism, a hosting assignment technique that only considers the nodes of a cluster, and treats all nodes in a cluster as equal candidates, is not always ideal. For example, spreading chunk-hosting assignments among nodes in a cluster does not make use of resources that are outside the cluster, such as the resources of disjoint database systems.
A disjoint database system is a database system that is logically outside another database system. For example, if DBMS1 is disjoint from DBMS2, then the database server instances in DMBS1 do not have direct access to the same database as DBMS2, and visa-versa. Conventionally, database systems that are disjoint from the database system that is making hosting assignments are not considered in the hosting assignments. Thus, the resources of DBMS2 would not be used when DBMS1 makes hosting assignments for table T1. The failure to make use of DBMS2's resources in this example may be highly inefficient, because DBMS2 may itself be a clustered database system with a significant amount of resources. If it were possible to host objects in DBMS2, in addition to DBMS1, more database objects could be hosted.
In many situations, involving a disjoint database system in the hosting assignments would not even make sense because the disjoint database system would not have access to the database object that is to be hosted. For example, if DBMS2 does not have access to table T1, it does not make sense to assign a node in DBMS2 to be a host of table T1. However, when the disjoint DBMS is a replica of the database system that is making the hosting assignments, then the disjoint DBMS has access to a replica of the database object in question.
Unfortunately, when the disjoint DBMS is a replica, the hosting assignments are also replicated. That is, the same hosting assignments made in the original database system are made in the replica database system. For example, if table T1 is hosted in DBMS1, then table T1 is also hosted in DBMS2. While this approach does make use of the resources of DBMS2, it does not actually increase the number of objects that can be hosted. Instead, it merely causes DBMS2 to load copies of all the same objects that are already going to be hosted in DBMS1. That is, if both DBMS1 and DBMS2 have 3 Terabytes of volatile memory available for hosting objects, then together the two database systems have 6 Terabytes of volatile memory for hosting objects. However, because the same objects are hosted in both systems, only 3 Terabytes of objects can be hosted.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.