Cluster databases provide location transparency to data by allowing multiple systems to serve the same database. One specific type of cluster database is the Oracle Real Application Clusters product, licensed by Oracle Corporation, Redwood Shores, Calif. Sets of two or more computers are grouped into real application clusters. The clusters harness the processing power of multiple interconnected computers to provide a single robust computing environment. Within each cluster, all nodes concurrently execute transactions against the same database to synergistically extend the processing power beyond the limits of an individual component. Upon the mounting of the shared database, the real application cluster processes a stream of concurrent transactions using multiple processors on different nodes. For scale-up, each processor processes many transactions. For speed up, one transaction can be executed spanning multiple nodes.
Cluster databases provide several advantages over databases that use only single nodes. For example, cluster databases take advantage of information sharing by many nodes to enhance performance and database availability. In addition, applications can be sped up by executing across multiple nodes and can be scaled-up by adding more transactions to additional nodes. Multiple nodes also make cluster databases highly available through a redundancy of nodes executing separate database instances. Thus, if a node or database instance fails, the database instance is automatically recovered by the other instances which combine to serve the cluster database.
Cluster databases can be made more highly available through integration with high availability frameworks for each cluster. The inclusion of these components provides guaranteed service levels and ensures resilient database performance and dependable application recovery. Organizationally, individual database servers are formed into interconnected clusters of independent nodes. Each node communicates with other nodes using the interconnection. Upon an unplanned failure of an active database server node, using clusterware, an application will fail over to another node and resume operations, without transaction loss, within a guaranteed time period. Likewise, upon a planned shutdown, an application will be gracefully switched over to another node in an orderly fashion.
The guarantee of service level thresholds is particularly crucial for commercial transaction-based database applications, such as used in the transportation, finance, and electronic commerce industries. System downtime translates to lost revenue and loss of market share. Any time spent recovering from a system failure is measurable in terms of lost transactions. Consequently, high availability systems budget a set time period to help minimize lost revenue due to unplanned outages. High availability systems also budget for planned service interruptions.
Database servers operating in the database server tier implement memory caches to transiently stage data and instructions to improve overall system performance. These memory caches take advantage of the locality of data and parsed SQL as physically stored in secondary storage. Performance is enhanced by maintaining active sets of data and parsed SQL within the memory cache (system global area) to avoid incurring latencies while waiting on the retrieval of data and instructions from the secondary storage, or to reparse the SQL.
In particular, database servers implement library caches and buffer caches. Library caches store parsed SQL and parsed PL/SQL. These caches employ a cache replacement scheme staging the most recently used SQL and the SQL having the largest context areas. Within the library cache, parsed SQL is stored as cursors. The cursors are indexed by handlers referencing memory locations within which parsed statements and information relating to processing are stored. A context area is a shared area of memory that stores the environment and session variables for an instruction. Buffer caches store active data and use a cache replacement scheme storing the most recently used data.
Following a failover or switchover from an active node of a clustered system, the library and buffer caches on a standby node of a clustered system are effectively empty. Response times are slow until these caches are restored with SQL cursors and data. This ramp-up period lasts from the time that the application session resumes operation on the new database instance to the time that response times return to normal levels. Processing performed during the ramp-up period is inefficient, as the amount of work completed per transaction is higher due to the need to re-initialize the memory caches. Moreover, the extra work is serialized due to locking on the library and buffer caches and is duplicative of work already accomplished on the failed node.