A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
The present invention relates generally to information processing environments and, more particularly, to computer-implemented methodologies for enhancing concurrent access in a data processing system, such as a Database Management System (DBMS).
Computers are very powerful tools for storing and providing access to vast amounts of information. Computer databases are a common mechanism for storing information on computer systems while providing easy access to users. A typical database is an organized collection of related information stored as xe2x80x9crecordsxe2x80x9d having xe2x80x9cfieldsxe2x80x9d of information. As an example, a database of employees may have a record for each employee where each record contains fields designating specifics about the employee, such as name, home address, salary, and the like.
Between the actual physical database itself (i.e., the data actually stored on a storage device) and the users of the system, a database management system or DBMS is typically provided as a software cushion or layer. In essence, the DBMS shields the database user from knowing or even caring about underlying hardware-level details. Typically, all requests from users for access to the data are processed by the DBMS. For example, information may be added or removed from data files, information retrieved from or updated in such files, and so forth, all without user knowledge of underlying system implementation. In this manner, the DBMS provides users with a conceptual view of the database that is removed from the hardware level. The general construction and operation of a database management system is known in the art. See e.g., Date, C., An Introduction to Database Systems, Volume I and II, Addison Wesley, 1990; the disclosure of which is hereby incorporated by reference.
DBMS systems have long since moved from a centralized mainframe environment to a de-centralized or distributed environment. One or more PC xe2x80x9cclientxe2x80x9d systems, for instance, may be connected via a network to one or more server-based database systems (SQL database server). Commercial examples of these xe2x80x9cclient/serverxe2x80x9d systems include Powersoft(trademark) clients connected to one or more Sybase Adaptive Server(trademark) database servers. Both Powersoft(trademark) and Sybase Adaptive Server(trademark) (formerly Sybase SQL Server(trademark)) are available from Sybase, Inc. of Emeryville, Calif.
To support multiple clients concurrently accessing data, a database system must employ some sort of concurrency control. Concurrency control is the way that a database synchronizes clients"" or users"" access to data to ensure that they do not destroy each other""s work. Locking is the way that most databases handle concurrency control. While one user has a lock on a subset of the database, that data is xe2x80x9cblockedxe2x80x9d or protected from other users"" attempts to make changes to it. One user""s changes have to be xe2x80x9ccommittedxe2x80x9d before another user can see or further change the same data.
One of the characteristics of a locking scheme is the level of the locks. Some products default to physical storage page-level locks, while others default to logical row-level locks. Some products allow the user to specify which level of locking (page, row, or table) will be used for an individual table. Another of the characteristics of a locking scheme is the lock type, whether dynamic or static. A static locking scheme sets up one level of locking for each table in the database, often at table creation time or via database defaults. Dynamic locking attempts to optimize the granularity of locks depending on the particular change being made. A dynamic locking scheme will automatically increase the granularity of locks from row to page to table level when a certain threshold is met in the more fine-grained locks. The goal is to use the least amount of system overhead per lock while getting as many users safely into the database as possible. The usual approach is to start with a higher level lock, then reduce its granularity only when there is contention from another transaction that wants to lock the same page or table for an UPDATE, DELETE, INSERT, or SELECT operation. This minimizes the total number of locks.
The approach of initially using a lower granularity and escalating it when a threshold level is reached has problems. For example, escalation from row-level to page-level locks can cause deadlocks when two transactions attempt to acquire page-level locks on the same page at the same time. Page-level locking is particularly problematic for highly-concurrent data structures whose structures must be maintained, such as B-tree data structures. Conventionally, when two different transactions are attempting to insert into the same B-tree page, for instance, the underlying system employs transaction durational page-level locks, so that the two competing inserts would be serialized. If enough room does not exist for the inserts, the page is split. A split in one part of the tree does not occur in isolation but propagates to its parent. Often, the split is done as a separate actionxe2x80x94a xe2x80x9cnested topxe2x80x9d actionxe2x80x94within the transaction. After the split is completed, the system continues with the current transaction. If two transactions are trying to perform splits at different locations in the B-tree in a conventional system, the transactions are serialized, as a conventional system only allows one top action at a time. The process of serializing page splits leads to much contention among the multiple transactions concurrently executing within the system.
Each day more and more businesses are run from mission-critical systems which store information on server-based SQL database systems, such as Sybase Adaptive Server(trademark). As a result, increasingly higher demands are being placed on server-based SQL database systems to provide enterprise-wide decision support. Since reducing the time required for processing queries substantially impacts system performance, it is highly desirable to optimize query processing. Without further optimization, processing of a query in real-time can be particularly burdensome, especially since a relatively high number of computations and possibly disk or network input/output operations must be performed in order to evaluate each row of the table or tables under examination. Accordingly, there is much interest in improving the performance of such tasks, particularly in terms of execution speed and reliability.
The present invention comprises a Client/Server Database System with improved methods for enhancing concurrency using a xe2x80x9crow updatexe2x80x9d bit and deferred locking. In an exemplary embodiment, the system includes one or more Clients (e.g., Terminals or PCs) connected via a Network to a Server. The Server, operating under a server operating system (e.g., UNIX), includes a Database Server System, such as Sybase Adaptive Server(trademark). In general operation, Clients store data in and retrieve data from one or more database tables resident on the Server by submitting SQL commands, some of which specify xe2x80x9cqueriesxe2x80x9dxe2x80x94criteria for selecting records of a table for a particular operation (e.g., updating). Queries are typically provided using a data manipulation language (DML), such as using the well-known SQL syntax.
The present invention introduces a methodology for using a xe2x80x9crow updatexe2x80x9d (ROW_UPDATE) bit and deferred locking techniques that reduce locking overhead and increase concurrency, during database operations requiring access (e.g., query processing). Each data row employs ROW_DELETE and ROW_UPDATE status bits (flags or indicators). When a transaction updates a row, it sets the ROW_UPDATE bit. When a transaction inserts a row, neither the ROW_UPDATE nor the ROW_DELETE status bits are set. When a transaction deletes a row, it sets the ROW_DELETE bit but the contents of the data row are left intact. Note that if a row gets updated and then deleted, it will have both its ROW_UPDATE and ROW_DELETE status bits set. When an insert (operation) rolls back, it sets the ROW_DELETE bit. When a delete rolls back, it clears the ROW_DELETE bit. When an update (operation) rolls back, it restores the old image of the row, and thus the row gets whatever status bits that existed in the old image.
The ROW_UPDATE and ROW_DELETE bits are cleaned up as follows. When a scan requests a lock on a row with ROW_UPDATE bit set and gets LOCK_GRANTED status from the Lock Manager, it means that the transaction which set the ROW_UPDATE bit has completed and so the ROW_UPDATE bit can be cleared. Similarly a row with ROW_DELETE bit set can be removed from the page if it is determined that the deleting transaction has completed.
The basic approach or methodology adopted for table scans is as follows. The action taken by a scan depends on whether the row qualifies and the status of the row status bits. Qualification is done while holding a latch. First, the method examines the status (bits) of the data row under consideration (for locking). Four possible combinations exist: unset, delete, update, and update/delete. In the case that the status bits are not set (i.e., xe2x80x9cunsetxe2x80x9d), the method proceeds to determine whether the data qualifies (i.e., the scan criterion). If the data qualifies (xe2x80x9cyesxe2x80x9d case), the method proceeds to grant the lock (request) and return the data row. If, however, the data does not qualify (xe2x80x9cnoxe2x80x9d case), the row is instead skipped, as the row will never quailfy.
In the case that the status is xe2x80x9cdeletexe2x80x9d (i.e., only the xe2x80x9cdeletexe2x80x9d status bit is set), the method proceeds to determine whether the data qualifies. If the data qualifies, the method proceeds to process a request for an instant duration lock (xe2x80x9clock instantxe2x80x9d). The instant duration lock is a mechanism that allows the client (requesting the lock) to see whether there exists a conflicting lock already held on the row (i.e., from another concurrent transaction). If no conflict is found, the xe2x80x9clock instantxe2x80x9d request will be granted and the client will know that the xe2x80x9cdeletexe2x80x9d has committed. Accordingly, the row may be skipped as the row is xe2x80x9cdeletedxe2x80x9d (and that deletion has been committed). If a conflicting lock exists, however, the xe2x80x9clock instantxe2x80x9d request will fail. Here, it is possible that the xe2x80x9cdeletexe2x80x9d may in fact roll back (as a result of the roll back of another concurrent transaction). Accordingly, the client will instead block (i.e., sleep on the lockxe2x80x94a waiting for the lock to be granted) so that a rollback of the xe2x80x9cdeletexe2x80x9d may be detected. On the other and, if the data does not qualify (xe2x80x9cnoxe2x80x9d case), the method may proceed to skip the row as it will never qualify (even if the xe2x80x9cdeletexe2x80x9d rolls back).
In the case that the status is xe2x80x9cupdatexe2x80x9d (i.e., only the xe2x80x9cupdatexe2x80x9d status bit is set), the method proceeds to determine whether the data qualifies. If the data qualifies (xe2x80x9cyesxe2x80x9d), the method proceeds to grant the lock and return the row. If the data does not qualify (xe2x80x9cnoxe2x80x9d case), then a instant duration lock is requested. As was previously discussed above, the xe2x80x9clock instantxe2x80x9d request serves as a shortcut method for determining whether the xe2x80x9cupdatexe2x80x9d has committed and, if not, for sleeping on the lock until the lock may be granted. If the xe2x80x9cupdatexe2x80x9d has committed (i.e., the xe2x80x9clock instantxe2x80x9d request can be granted), the row is skipped (since, recall, the data does not qualify the scan criterion). Finally, it is possible that both xe2x80x9cupdatexe2x80x9d and xe2x80x9cdeletexe2x80x9d (i.e., both the xe2x80x9cupdatexe2x80x9d and xe2x80x9cdeletexe2x80x9d status bits) are set. Regardless of whether the data qualifies, the method will process a xe2x80x9clock instantxe2x80x9d request/skip action, in a manner similar to that previously described.
Deferred locking is a technique where the Index Manager returns record IDs (RIDs) to the Data Layer without acquiring locks on them during scans that have scan arguments (SARGS) on columns not in the index; the Data Layer subsequently qualifies the data row and determines whether locking is really necessary. Conventionally, in index scans, locking is done in the Index Manager (based only on whether the index row qualifies). However, in accordance with the present invention, the locking is done after reading the data row, taking into account the status bits of data row and whether the data row qualifies. This optimization is performed when there are some qualification conditions on columns but not in the index.
The Index Manager qualifies index rows. It qualifies both the rows with and without the ROW_DELETE bit set. Rows that do not satisfy SARGS on index columns are discarded. If a qualifying row does not have its ROW_DELETE bit set, the RID from it is returned to the Data Layer. On the other hand, if the qualifying row has its ROW_DELETE bit set, an instant lock is acquired on it to see if the deleting transaction has committed. If the lock is granted (i.e., Lock Manager returns LOCK_GRANTED or LOCK_NOTNEEDED), the row is discarded. Otherwise, RID from the row is returned to the Data Layer.