1. Field of the Invention
The present invention relates generally to data processing and, more particularly, to optimizing data transfer among various resources in a distributed environment.
2. Description of the Background Art
Computers are very powerful tools for storing and providing access to vast amounts of information. Computer databases are a common mechanism for storing information on computer systems while providing easy access to users. A typical database is an organized collection of related information stored as “records” having “fields” of information. As an example, a database of employees may have a record for each employee where each record contains fields designating specifics about the employee, such as name, home address, salary, and the like.
Between the actual physical database itself (i.e., the data actually stored on a storage device) and the users of the system, a database management system or DBMS is typically provided as a software cushion or layer. In essence, the DBMS shields the database user from knowing or even caring about the underlying hardware-level details. Typically, all requests from users for access to the data are processed by the DBMS. For example, information may be added or removed from data files, information retrieved from or updated in such files, and so forth, all without user knowledge of the underlying system implementation. In this manner, the DBMS provides users with a conceptual view of the database that is removed from the hardware level. The general construction and operation of database management systems is well known in the art. See e.g., Date, C., “An Introduction to Database Systems, Seventh Edition”, Part I (especially Chapters 1-4), Addison Wesley, 2000.
Databases add data to and retrieve data from mass storage devices during normal operation. Such storage devices are typically mechanical devices such as disks or tape drives which transfer data rather slowly, thus impacting database access of information. To speed up the access process, databases employ a “buffer cache” which is a section of relatively faster memory (e.g., RAM) allocated to store recently used data objects. This faster memory (simply be referred to as “memory,” as distinguished from mass storage devices such as disks) is typically provided on semiconductor or other electrical storage media and is coupled to the CPU via a fast data bus. Because the transfer of data in memory is governed by electronic rather than mechanical operations, the data stored on the memory can be accessed much more rapidly than data stored on disks.
In recent years, users have demanded that database systems be continuously available, with no downtime, as they are frequently running applications that are critical to business operations. In response, distributed database systems have been introduced to provide for greater reliability. More recently, “Shared Disk Cluster” database systems have been introduced to provide increased reliability and scalability. A “Shared Disk Cluster” (or “SDC”) database system is a system that has a cluster of two or more database servers having shared access to a database on disk storage. The term “cluster” refers to the fact that these systems involve a plurality of networked server nodes which are clustered together to function as a single system. Each node in the cluster usually contains its own CPU and memory and all nodes in the cluster communicate with each other, typically through private interconnects. “Shared disk” refers to the fact that two or more database servers share access to the same disk image of the database. Shared Disk Cluster database systems provide for transparent, continuous availability of the applications running on the cluster with instantaneous failover amongst servers in the cluster. When one server is down (e.g., for upgrading the CPU) the applications are able to continue to operate against the shared data using the remaining machines in the cluster, so that a continuously available solution is provided. Shared Disk Cluster systems also enable users to address scalability problems by simply adding additional machines to the cluster, without major data restructuring and the associated system downtime that is common in prior SMP (symmetric multiprocessor) environments.
In a distributed shared disk cluster environment, all nodes in the cluster have caches with only one buffer pool of a fixed size. The buffer pool size is the same as the database page size, and is the same across all instances in the cluster. However, that approach is not necessarily optimal for I/O operations involving large contiguous data, nor does the approach taken into account the local access patterns of a given node. Therefore, a better approach is sought.
What is needed is an approach for supporting multiple buffer pools in a cache in a distributed shared disk cluster environment, especially as there is a need in such an environment to handle data transfer among nodes efficiently. The approach should allow each of the nodes to be able to support independently configured buffer pools for caching the data. In that manner, buffer pools of sizes larger than the database page size can be configured within a cache. These pools can help read or write large contiguous data and help enhance I/O performance. The present invention fulfills this and other needs.