This invention relates generally to the field of computer data concurrency and in particular to a dynamic cost based multi granularity locking mechanism.
A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever. The following notice applies to the software and data as described below and in the drawing hereto: Copyright(copyright) 1998, Microsoft Corporation, All Rights Reserved.
Many computer applications use a function referred to as locking to ensure that data the applications are modifying is not modified by another application or process. Locking the data prevents users from changing the same data at the same time. If locking is not used, data may become logically incorrect, and future use of the data may produce unexpected results.
If locking is not used in a database system, multiple transactions trying to use the same data concurrently can give rise to several different types of problems referred to as concurrency problems. Sometimes updates of data by one application may be lost if overwritten by another application. Sometimes data is modified, but not immediately written out in a manner that it can be read by other applications. This can also result in reads of data which should be the same, not being the same. Further problems can result from applications using copies of data, which are then changed by a later application.
There are several ways of dealing with the concurrency problem in database servers. Optimistic concurrency works on the assumption that resource conflicts between multiple users are unlikely, and allows transactions to execute without locking any resources. Only when attempting to change data are resources checked to determine if any conflicts have occurred. If a conflict did occur, the application must read the data and attempt the change again. Pessimistic concurrency locks resources as they are required, for the duration of the transaction. Unless deadlocks occur, a transaction is assured of successful completion.
While there are several different types of locks, such as exclusive locks, shared locks, update locks and other types of locks, many of them provide some amount of concurrency to ensure that two applications do not modify data at the same time, or use data which may or may not be accurate. These types of locks consume system overhead and have associated costs. Fine grain locks, which lock on a small amount of data, such as on a row of a database, each have memory resources and processing resources associated with storing and managing them. If many rows are involved in a transaction which needs to lock portions of the database to ensure concurrency, a significant amount of system resources can be consumed. In prior systems, if too many resources are consumed during processing of the transaction, the level of the lock is increased or escalated, such as to the page level during the transaction to reduce the system overhead. This happens when an escalation threshold related to the number of locks used by a transaction is exceeded. There is some expense in changing the level of the lock because first a higher level lock must be obtained. It may not be available immediately until other applications complete operations and release locks on lower levels. This can result in significant delays, and also create problems such as deadlocks if two applications need to lock the same higher level and will not release lower levels until they gain a lock on the higher level. It can be a complex conflict to resolve. Some applications may even time out and restart later, duplicating previous processing as well as resulting in significant delays.
Still other systems may apply a very rough heuristic, such as applying a table lock for a full table scan or row locks for a singleton fetch or partial range scan, in an attempt to ensure completion of a transaction by selecting one of two levels of locking. The levels are usually decided upon between a table and page level, or a table and row level dependent on the operations to be performed. These systems offer very little flexibility, and still need to rely on escalation if available to ensure that transactions have a chance of succeeding.
There is a need for a locking mechanism that ensures transactions can take place without escalating locking levels in the middle of a transaction. There is a need for a locking mechanism that ensures once a transaction is started, it has a good chance of succeeding. There is yet a further need for a locking mechanism that provides good concurrency and good performance.
A multilevel data locking mechanism determines the cost of multiple levels or granularities of locks for a query prior to initiating a transaction. Locking of data can mean that during a transaction, the locked data is not available for use by other queries, making them wait for execution until the data is unlocked. A granularity of lock is used for the transaction which is a function of cost. By using a dynamic locking strategy to determine the most cost effective granularity of locking for each transaction, overall performance is increased. It also relieves developers of having to consider locking strategies when developing applications.
In an SQL server, the locking selection is performed by a lock manager, and is based on row, page and table granularities. An SQL server provides access to a database comprising rows of data arranged in tables. Data is stored in memory on pages, which are predetermined numbers of bytes, such as four thousand in one server. Usually, a table has multiple rows per page, and multiple pages per table. However, this relationship may vary from database to database.
Given such a granularity relationship, row locking has a high CPU and memory overhead or cost associated with tracking a lock for each row. Page locking has a lower overhead, but may needlessly prevent others from accessing rows that did not really need to be locked, resulting in higher concurrency costs. Finally, table locks can greatly reduce CPU and memory cost, but at the expense of much higher concurrency cost. Each of these costs are calculated or estimated prior to execution of a transaction such as a query, and the locking granularity having the lowest cost is selected for use during the query.
A number of factors based on the characteristics of the database schema and query are considered in determining which granularity will result in the lowest cost. These factors include the number of rows per page, the number of rows per index scan, the isolation level per scan, an update factor which is based on a percentage of scanned rows to be updated, the memory load on the system, the cost of concurrency which is based on the number of rows per page and pages per table, and the number of active scans currently on the table.
The present invention provides a better system performance over the prior methods of locking. By providing dynamic scalable locking based on cost, developers may focus on development of applications as opposed to trying to develop an optimal locking scheme. Determining the proper granularity at run time ensures that the current state of the system is considered, which aids in choosing the proper granularity. It also ensures that the granularity chosen is highly likely to result in a successful scan. Further, it can simplify database administration by not having to readjust locking escalation thresholds to ensure proper execution of queries.