Database systems typically store database objects (e.g. tables, indexes, etc.) on disk, and load data items from those database objects into volatile memory on an as-needed basis. Once loaded into volatile memory, the data items may remain cached in volatile memory so that subsequent accesses to the same data items will not incur the overhead of accessing a disk. Those data items may be replaced in cache, for example, to make room in volatile memory to store other items that have been requested.
Rather than load individual data items on a per-item basis, entire database objects, or portions thereof, may be loaded into volatile memory. Various approaches for loading entire database objects, or selected portions thereof, into volatile memory to speed up query processing are described in U.S. patent application Ser. No. 14/377,179, entitled “Mirror, In Memory, Data from Disk To Improve Query Performance,” filed Jul. 21, 2014, referred to herein as the “Mirroring” application, the contents of which are incorporated herein in its entirety.
According to the approaches described in the Mirroring application, data objects, or portions thereof, are stored in volatile memory in a different format than the format that those same objects have on disk. For example, the in-memory version of the objects may be in a column-major format, while the on-disk version stored data in a row-major format. The in-memory version of the object (or selected portions thereof), is referred to as an In-Memory Compression Unit (IMCU) because the data contained therein is often compressed.
In a clustered database system, multiple “nodes” have access to the same on-disk copy of a database. Typically, each node is a computing device with its own local memory and processors that are running one or more database server instances. The database server instances on each of the nodes may receive queries to access the database. The speed at which a given database server instance is able to answer a query is based, at least in part, on whether the node on which the database server instance is running has the requested data cached within its local volatile memory. Consequently, to improve every node's performance of queries that access data in a Table X, Table X may be loaded into the volatile memory of every node in the cluster.
Unfortunately, loading the same data (e.g. Table X) into the volatile memory of every node in a cluster of N nodes means that the cluster can only cache approximately the same amount of data as a single node, even though a cluster of N nodes has N times the amount of volatile memory as a single node.
Rather than load the same data into the memory of every node in a cluster, the portions of data objects (or “chunks”) may be distributed across the nodes in the cluster. Various approaches for distributing the database objects, or selected portions thereof, among nodes and executing queries in the multi-node database are described in U.S. patent application Ser. No. 14/805,949, entitled “Framework for Volatile Memory Query Execution in a Multi-Node Cluster,” filed Jul. 22, 2015 referred to herein as the “Hashing” application, the contents of which are incorporated herein in its entirety.
Each node in the cluster that has been assigned to load a copy of a particular chunk into the node's volatile memory is referred to herein as a “host node” of the particular chunk. According to the approaches described in the Hashing application, all chunks are not hosted by all nodes. Rather, any given chunk is hosted by a subset of the nodes in the cluster. By using the same hashing function, each node in the cluster may independently determine that a particular node has been assigned to host a particular chunk. Each node may maintain a chunk-to-node mapping to indicate how chunks are distributed across the volatile memories of the multiple nodes. Using the chunk-to-node mapping, nodes may select to whom to send work. For example, a node that has received a query that requires access to a particular chunk may send the work to a node, in the cluster, that is hosting that chunk in its volatile memory.
However, when a node is removed from or added to a cluster, the other nodes in the cluster need to update their chunk-to-node mapping to reflect the change. Immediately after such a change, distributing work based on the chunk-to-node mappings may result in inefficiencies. For example, a node that has been newly assigned to a host particular chunk may not have finished storing the chunk in its volatile memory. Sending work involving that chunk to that node will be inefficient, especially if another node has already loaded that same chunk. As another example, an out-of-date mapping may indicate a particular node is assigned to host a particular chunk, but due to a change the particular node may no longer be in the cluster.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.