A cluster is a group of independent computers working together as a single system. In a client/server environment, client computers interact with a cluster as though it were a single entity, a single high-performance, highly reliable server. If one computer in a cluster fails, its workload can be automatically distributed among the surviving computers. Computers in a cluster may be used to execute the software instructions of an application (also called “parallel application”) in parallel. Examples of parallel applications include database servers, application servers, data mining tools, decision support systems, computer-aided-design tools, gene sequencing tools, seismic (earthquake prediction) tools and modeling tools (e.g. climate, combustion, reservoir, structure, molecules, nuclear). Oracle Parallel Server (OPS) adds parallel technology to the Oracle8i™ database, to enable multiple instances (e.g. Instance1 and Instance2 in FIG. 1) of the database server to execute on computers of a cluster and concurrently access a single shared database that may be resident in an array 5 of disks. Disk storage array 5 provides fault tolerant disk components. Each computer acts as a single node in the configuration. Every computer in a cluster can be connected to a shared array 5 of disks as well as its own local disk 6. All of the computers in the cluster have concurrent read/write access to the data stored on the shared disks. The Oracle Parallel Server (OPS) is described in detail in Oracle8i Parallel Server Concepts, Release 2 (8.1.6), December 1999, Part No. A76968-01, available from Oracle Corporation, Redwood Shores, Calif., and incorporated by reference herein in its entirety.
If one computer in an Oracle™ Parallel Server fails, the other computers still have uninterrupted access to the data stored on the shared disks. The surviving computer(s) automatically perform recovery by rolling back any incomplete transactions that the failed computer was attempting. This ensures the logical consistency of the database. Disk mirroring of the shared disk drives can also be used to minimize the effect of a disk failure. With disk mirroring, a duplicate copy of the contents of the disk is kept on a different physical drive. If a particular disk fails, the cluster software transparently switches to the mirrored copy of the disk and processing continues.
Typically, a single instance of a database process (also called “Oracle instance”) is executing on each of the computers (also called “nodes”) that form a cluster. An Oracle instance is composed of processes and shared memory. Within the shared memory is a buffer cache for the Oracle instance. The buffer cache contains disk blocks and improves performance by eliminating disk I/O. Since memory cannot be shared across nodes in a cluster, each Oracle instance contains its own buffer cache. A parallel cache manager (PCM) coordinates access to data resources required by the Oracle instances.
In addition to the buffer cache, several other resources require coordination by Oracle Parallel Server across instances, including dictionary, rollback segments and redo logs. Another component is Cluster Group Services (CGS) that interacts with a Cluster Manager (CM) to track cluster node status and keeps the database aware of which nodes forms an active cluster. The Cluster Manager is a vendor-supplied component specific to the hardware and OS configuration, and unrelated to a database.
Also, Oracle8i provides a load-balancing feature to distribute connections from client computers across the cluster, maximizing transaction throughput and minimizing response time. Load balancing requires monitoring resource utilization levels on each node in the cluster, and directing the client connections to the least, loaded cluster node. In the event of a failure of node 8 (FIG. 1), Oracle Parallel Server can failover a connection with a client 7 to a functioning and least loaded node 9 of the cluster. This is done transparently, i.e., without user knowledge or intervention in the case of query operations.
Oracle8i supports high user populations by using Oracle Multithreaded Server (MTS) configuration. MTS is based on a database resource sharing architecture where processes called “listeners” route client connections to a group of other processes called “dispatchers” that interact with server processes to handle the connections. Oracle Parallel Server environments can be configured with MTS, where each node in the cluster is configured with one or more dispatchers (such as D1 for Instance1 in node 8, and D2 and D3 for Instance2 in node 9, as illustrated in FIG. 1). In Oracle8i, the listeners (such as L1 and L2 in FIG. 1) can be configured locally or on remote nodes to provide greater scalability and system availability.
To facilitate load balancing, Oracle instances on each node register with and communicate with all the listeners regarding CPU utilization in each node. Implementation phases for load-balancing in the example of FIG. 1 include: client connections are distributed in a random fashion across available listeners, L1 and L2. The randomized load balancing policy ensures that client requests are spread efficiently across available listeners. Assume that L1 was chosen to receive the client request. Listener L1 compares the CPU load on the two computers.
If the second computer (containing dispatchers D2 and D3) is less loaded, listener L1 chooses the second computer. This allows the least loaded node to process incoming client connections. The listener L1 then compares the load, or active connections, on the dispatchers, D2 and D3. If Dispatcher D2 is less loaded than Dispatcher D3, listener L1 will choose to direct the client request to Dispatcher D2. This allows the dispatcher with the least number of active connections to process the incoming client connections.
When one or more new instances are to be added to Oracle Parallel Server during operation, one may bring down the database and recreate the entire database from scratch with new instances included. For example, see “Adding Additional Nodes to a Cluster” on page 9-7 of Oracle8i Parallel Server Setup and Configuration Guide, Release 2 (8.1.6), December 1999, Part No. A76934-01 that is incorporated by reference herein in its entirety.