1. Field of the Invention
The present invention relates generally to data processing environments, and more particularly to adaptive locking of retained resources in a distributed database processing environment.
2. Background Art
Computers are very powerful tools for storing and providing access to vast amounts of information. Computer databases are a common mechanism for storing information on computer systems while providing easy access to users. A typical database is an organized collection of related information stored as “records” having “fields” of information. As an example, a database of employees may have a record for each employee where each record contains fields designating specifics about the employee, such as name, home address, salary, and the like.
Between the actual physical database itself (i.e., the data actually stored on a storage device) and the users of the system, a database management system or DBMS is typically provided as a software cushion or layer. In essence, the DBMS shields the database user from knowing or even caring about the underlying hardware-level details. Typically, all requests from users for access to the data are processed by the DBMS. For example, information may be added or removed from data files, information retrieved from or updated in such files, and so forth, all without user knowledge of the underlying system implementation. In this manner, the DBMS provides users with a conceptual view of the database that is removed from the hardware level. The general construction and operation of database management systems is well known in the art. See e.g., Date, C., “An Introduction to Database Systems, Seventh Edition”, Part I (especially Chapters 1-4), Addison Wesley, 2000.
In recent years, users have demanded that database systems be continuously available, with no downtime, as they are frequently running applications that are critical to business operations. Shared Disk Cluster systems are distributed database systems introduced to provide the increased reliability and scalability sought by customers. A Shared Disk Cluster database system is a system that has a cluster of two or more database servers having shared access to a database on disk storage. The term “cluster” refers to the fact that these systems involve a plurality of networked server nodes that are clustered together to function as a single system. Each node in the cluster usually contains its own CPU and memory and all nodes in the cluster communicate with each other, typically through private interconnects. “Shared disk” refers to the fact that two or more database servers share access to the same disk image of the database. Shared Disk Cluster database systems provide for transparent, continuous availability of the applications running on the cluster with instantaneous failover amongst servers in the cluster. When one server is down (e.g., for upgrading the CPU) the applications are able to continue to operate against the shared data using the remaining machines in the cluster, so that a continuously available solution is provided. Shared Disk Cluster systems also enable users to address scalability problems by simply adding additional machines to the cluster, without major data restructuring and the associated system downtime that is common in prior SMP (symmetric multiprocessor) environments that provide fast performance by making multiple CPUs available to complete individual processes simultaneously (multiprocessing).
In any database system, distributed or otherwise, data can be organized and accessed as “pages”. When data is brought from the disk into the main memory, “page” is the basic unit of access. Within the page, the data can be present as “rows”. For a transactional system, multiple transactions can be active on a single page at any point of time, each accessing a subset of rows within the page, when the system uses row-level locking.
In a distributed system such as shared disk cluster, transactional locks or logical locks are used for transactional consistency. These locks can either be page-level locks in which the entire page is locked, or row-level locks in which a particular row in a page is locked, or higher-level locks, such as table locks that are used to lock the entire table. These locks are held for relatively long duration, e.g., until the end of the transaction.
For physical consistency of the page, such as when multiple transactions are modifying different rows in the same page at the same time, physical locks, also called latches in popular SMP terminology, are used. These locks are held for relatively short duration, e.g., only for the time it takes to modify the data in the page in memory. With the help of physical locks, the physical operations on a particular page are serialized under typical conditions. Commonly, these locks can be acquired in “shared” mode, “exclusive” mode, or “null” mode, where a shared physical lock is compatible with other shared physical locks but incompatible with an exclusive physical lock, and an exclusive physical lock is incompatible with shared and exclusive physical locks but compatible with “null” physical locks.
In a distributed system, the physical locks are retained at each node until they are claimed by other nodes. The retention of the locks in this manner avoids unnecessary repeated acquisition cycles that might occur if the locks are released immediately. For physical consistency, often, a two-level lock is used. The first level is the inter-node synchronization where the cluster-wide “physical lock” is acquired and the next level is an intra-node synchronization where the “latch” is acquired. The cluster-wide physical lock gives the right of access to a particular node that has acquired the lock, while the “latch” gives the right of access to a particular task within that node that has the physical lock.
The access to the page, i.e., the latches as well as the physical locks, is granted on a “first come, first served” basis. For instance, if a task requests a shared, SH, latch and is granted the latch, a second task requesting for the latch in exclusive, EX, mode will be blocked and be queued in a wait queue. If a third task requests the latch in SH mode, it too will blocked, and be placed behind the second task requesting for the EX latch in the wait queue. The behavior for the physical lock is similar at the node-level.
Whenever an attempt is made to physically lock a particular page p1 in a pool, it is desirable to lock the entire pool, if there is no contention, for I/O benefit. Pools are written out to disks and read from disks in entirety, which gives significant I/O benefit, since separate I/O on individual pages is much costlier. Also, pools are used in situations where it is assumed that in all likelihood, the subsequent pages in the pool will be accessed immediately and locking the pool will be of great benefit.
In a shared disk cluster, which supports independently configured large buffer pools at each of its nodes (i.e. the size of the buffer pools can vary at each node independent of the other nodes), a particular page can be present in two nodes in different pools at the same time. By way of background, U.S. patent application Ser. No. 11/675,323, entitled “System and Methods for Optimizing Data Transfer among Various Resources in a Distributed Environment”, assigned to the assignee of the present invention, describes buffer pools in a distributed system. For instance, if a page p1 resides in the cluster at nodes N1 and N2 and is being shared among them with SH physical locks, it is possible that the page is present in a 1-page pool at N1 and a 4-page pool at N2, the page pool including adjacent pages, say p2, p3, and p4. When a task holds a physical lock on one page p1 at node N1 and tries to lock another page p2, locking the pool would also mean having to downgrade locks on p1, p3 and p4 at N2. This can cause deadlocks, such as, for example, if another task at N2 already holds a physical lock on p4 by this time and is requesting lock on p1. It is difficult for the cluster lock manager to detect such deadlocks, especially since it has to get information about the latches involved, which can change quite quickly and such distributed deadlock detection has to happen for every latch request, which is costly.
Accordingly, a need exists for an approach to reduce contention and avoid deadlocks in a distributed system that avoids the limitations and shortcomings of prior approaches and provides more optimized control for physical access. The present invention addresses this and other needs.