A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
The present invention relates generally to information processing environments and, more particularly, to concurrent access and maintenance of concurrent data structures in a data processing system, such as a Database Management System (DBMS).
Computers are very powerful tools for storing and providing access to vast amounts of information. Computer databases are a common mechanism for storing information on computer systems while providing easy access to users. A typical database is an organized collection of related information stored as xe2x80x9crecordsxe2x80x9d having xe2x80x9cfieldsxe2x80x9d of information. As an example, a database of employees may have a record for each employee where each record contains fields designating specifics about the employee, such as name, home address, salary, and the like.
Between the actual physical database itself (i.e., the data actually stored on a storage device) and the users of the system, a database management system or DBMS is typically provided as a software cushion or layer. In essence, the DBMS shields the database user from knowing or even caring about underlying hardware-level details. Typically, all requests from users for access to the data are processed by the DBMS. For example, information may be added or removed from data files, information retrieved from or updated in such files, and so forth, all without user knowledge of underlying system implementation. In this manner, the DBMS provides users with a conceptual view of the database that is removed from the hardware level. The general construction and operation of a database management system is known in the art. See e.g., Date, C., An Introduction to Database Systems, Volume I and II, Addison Wesley, 1990; the disclosure of which is hereby incorporated by reference.
DBMS systems have long since moved from a centralized mainframe environment to a de-centralized or distributed environment. One or more PC xe2x80x9cclientxe2x80x9d systems, for instance, may be connected via a network to one or more server-based database systems (SQL database server). Commercial examples of these xe2x80x9cclient/serverxe2x80x9d systems include Powersoft(trademark) clients connected to one or more Sybase Adaptive Server(trademark) database servers. Both Powersoft(trademark) and Sybase Adaptive Server(trademark) (formerly Sybase SQL Server(trademark)) are available from Sybase, Inc. of Emeryville, Calif.
To support multiple clients concurrently accessing data, a database system must employ some sort of concurrency control. Concurrency control is the way that a database synchronizes clients"" or users"" access to data to ensure that they do not destroy each other""s work. Locking is the way that most databases handle concurrency control. While one user has a lock on a subset of the database, that data is protected from other users"" attempts to make changes to it. One user""s changes have to be xe2x80x9ccommittedxe2x80x9d before another user can see or further change the same data.
One of the characteristics of a locking scheme is the level of the locks. Some products default to physical storage page-level locks and while others default to logical row-level locks. Some products allow the user to specify which level of locking (page, row, or table) will be used for an individual table. Another of the characteristics of a locking scheme is the lock type, whether dynamic or static. A static locking scheme sets up one level of locking for each table in the database, often at table creation time or via database defaults. Dynamic locking attempts to optimize the granularity of locks depending on the particular change being made. A dynamic locking scheme will automatically increase the granularity of locks from row to page to table level when a certain threshold is met in the more fine-grained locks. The goal is to use the least amount of system overhead per lock while getting as many users safely into the database as possible. The usual approach is to start with a higher level lock, then reduce its granularity only when there is contention from another transaction that wants to lock the same page or table for an UPDATE, DELETE, INSERT, or SELECT. This minimizes the total number of locks. The other approach of initially using a lower granularity and escalating it when a threshold level is reached has problems. For example, escalation from row-level to page-level locks can cause deadlocks when two transactions attempt to acquire page-level locks on the same page at the same time.
Page-level locking is problematic for highly concurrent data structures whose structures must be maintained, such as B-tree data structures. Conventionally, when two different transactions are attempting to insert into the same B-tree page, for instance, the underlying system employs transaction durational page-level locks, so that the two competing inserts would be serialized. If enough room does not exist for the inserts, the page is split. A split in one part of the tree does not occur in isolation but propagates to its parent. Often, the split is done as a separate actionxe2x80x94a xe2x80x9cnested topxe2x80x9d actionxe2x80x94within the transaction. After the split is completed, the system continues with the current transaction. If two transactions are trying to perform splits at different locations in the B-tree in a conventional system, the transactions are serialized, as a conventional system only allows one top action at a time. The process of serializing page splits leads to much contention among the multiple transactions concurrently executing within the system.
Each day more and more businesses are run from mission-critical systems which store information on server-based SQL database systems, such as Sybase Adaptive Server(trademark). As a result, increasingly higher demands are being placed on server-based SQL database systems to provide enterprise-wide decision support. Accordingly, there is much interest in improving the performance of such system, particularly in terms of execution speed and reliability.
The present invention comprises a Client/Server Database System with improved methods for providing access to highly-concurrent data structures, such as B-Trees. In an exemplary embodiment, the system includes one or more Clients (e.g., Terminals or PCs) connected via a Network to a Server. The Server, operating under a server operating system (e.g., UNIX) includes a Database Server System, such as Sybase Adaptive Server. In general operation, Clients store data in and retrieve data from one or more database tables resident on the Server. Each table itself comprises one or more horizontal rows or records divided into columns or fields of information. For enhancing the speed in which the Database Server stores, retrieves, and processes particular data records, the Server maintains one or more database indexes on each table. A database index, which in an exemplary embodiment is maintained as a B-Tree data structure, allows the records of a table to be organized in many different ways, depending on a particular user""s needs, with the speed of retrieving any particular data record significantly improved.
Methodology is provided which affords high-concurrency access to B-tree structures. When the system receives a request to insert a key value into a B-Tree at page that does not have sufficient room, the system must split at the tree at leaf level. This is done by allocating a new page, and moving some of the key values from the old page to the new page. The split itself propagates upward. To do the split itself, the system obtains address locks for the two pages, and marks both as undergoing xe2x80x9csplitxe2x80x9d (i.e., a split operation). Referring back to the previously described data structures, the system would add the address locks to be linked list of address locks, maintained by SDES. Further, for the B-tree pages themselves, the system sets a Boolean flag or xe2x80x9csplit bitxe2x80x9d to xe2x80x9ctrue.xe2x80x9d
When the split is propagated up, a xe2x80x9cside entryxe2x80x9d is added to the old page to point to the newly allocated page. The old page, however, may not have sufficient room for storing this new entry (e.g., when it is already full). Accordingly, the parent page must split also. This is done by allocating a new page, at that level. Both pages, old and new parents, are marked as undergoing split. As before, the system obtains address locks for these pages as well. At this point, a side entry is created in the old parent page. This information serves to let any client which is searching for key value (or greater) know that, instead of going directly down the tree from the old parent page, it should proceed to the parent""s new sibling node or page. Note in particular that the traversal is not blocked by the split which is currently active. In effect, the client knows how to take a detour to arrive at the proper leaf-level page. Similarly, if the client is searching for a lesser key value, it will traverse to the proper page, without being blocked. Again, even though the page is undergoing a split, the client is not blocked from accessing the sought-after key value. After split propagation is complete, the system clears the split flags and releases address locks. Also at this point, the side entry is removed. Now, sufficient room now exists in the tree for inserting the key value.