The present invention relates to read-write lock algorithms, and more specifically to optimizing usage of such locks.
Within the class of passive locking algorithms there is a set known as recursive locks, otherwise known as reentrant or relockable locks. Recursive, reentrant and relockable locks are slightly different in the type of problem they try to address, but the most used implementation, relockable mutexes, addresses all three types of scenarios.
Non-reentrant locks can only be acquired once, meaning that if a process or thread tries to acquire a lock it already owns, it will stall waiting on itself. However, recursive locks can be acquired multiple times by the owner, the lock being relinquished when the owner releases the lock the same number of times it had previously acquired it. The usual implementation of recursive locks adds a lock counter (and a lock owner) to the lock structure, with the lock being available if the counter is zero, and busy otherwise.
Proponents of recursive locks highlight flexibility as their best feature. By just switching to recursive locks, an algorithm dealing with recursive structures or, for Object Oriented languages, derived classes, can easily be switched to being multithreaded without any other change. Asynchronous routines that deal with a shared structure can be executed without fear of deadlock due to the interrupted thread already owning the lock controlling the resources, and software functions manipulating a specific resource can be implemented to be called both in the case where the lock is already owned by the caller and the case where it is not, which greatly helps in terms of code reuse and maintainability. Conversely, opponents of recursive locks highlight a substantial performance penalty compared to non recursive locks and obfuscating interaction between participant threads, possibly hiding unintended or unforeseen behavior.
Read-write locks are a class of locks which allow multiple threads to access a controlled structure in read mode concurrently. Modification of the structure entails acquiring the lock in an exclusive mode. Read-write locks offer clear concurrency advantages over non-shared types of locks in environments where structures are modified seldom but accessed very frequently.
Relockable mutexes have a lock count and a mutex owner in order to be able to track the number of times the owner has acquired the lock and avoid self stalls. In a similar way, read-write locks have a reader count, so that a prospective exclusive locker has to wait for the reader count to revert to zero before it is allowed to acquire the lock. In order to promote fairness, while there are waiters in the lock queue, new readers will be made to wait even if readers are already accessing the controlled resource, so that writers have a chance to modify the shared resource in a timely manner.
Since, for performance and space reasons, readers are only tracked in number and not in identity, the logical step of first acquiring the read-write lock in shared mode and then recursively promote it to exclusive, rather than having the intended effect, will in fact produce a self stall.
Prior art relockable read-write mutexes offer no advantage over plain read-write locks mutexes in this respect. Keeping track of the number of exclusive locks by the same thread plus keeping track of the number of shared lockers, means that relockable mutexes will self stall when recursively mixing shared and exclusive lock attempts, in whatever order they happen.
Consider now the following scenario: a large concurrent application uses a complex shareable resource which exists as a single entity, but can be divided into subcomponents or portions, each of which can be modified individually, but cannot exist on its own. A real world scenario is a dictionary cache for a server. Industrial strength SQL servers, for instance, comprise, among others, a dictionary cache of tables, so that information about individual tables does not have to be accessed from disk every time a statement is parsed.
Individual cache entries comprise several parts, such as table name and owner, columns and types, indexes, check constraints and referential constraints. Some of the parts will make references to other entries, or entries in different dictionary caches, for example, referential constraints will point to dependent tables, or, in extensible SQL engines, column types may point to a user defined types dictionary cache. In terms of code reusability, it makes sense to have individual functions each dealing with individual parts, for instance one function to load table information, one for columns, one for indexes, one for constraints and the like.
When loading information about a table previously missing from the cache, since the entry information has no meaning until all the individual parts are available, a wrapper function would call in turn all the functions dealing with each individual parts and once successful push the new entry into the cache.
When new indexes are created, or dropped, only the function dealing with indexes needs to be called, but this time with the whole cache entry locked (the information still has no meaning until the new index information is updated).
When preparing a new statement, information about tables referenced by foreign key constraints needs to be checked, because the primary key table might have changed, its dictionary cache entry might be stale and referential constraints information would then need to be reloaded. This activity does not require an exclusive lock. It is only necessary to make sure that the current entry does not change when performing the check, which can be done with a shared lock, an exclusive lock only being needed if the constraint information is found to be stale and needs to be reloaded. Since during normal operation no changes would be expected in any table cache entry, being unable to perform such checks in parallel can constitute a serious performance bottleneck, depending on engine load and the popularity among statements of the table itself.
With current locking technology, there are several options. With plain locks, either code is duplicated, such as to handle a new table entry, and individual parts, when they are individually modified, or infrastructure is needed to pass the lock state in between wrapper and inner functions. Since the entry lock can only be acquired in exclusive mode, only one thread can check each individual entry for stale referential constraints information. This means that parsing statements using the same table is serialized.
With recursive locks, code reutilization is fine, albeit with a (possibly small) performance penalty. As the complexity of the code grows, the interaction between nested locks might become obfuscated, making code maintenance more complex. In time, having a single code line might become more complex than maintaining two specialized copies of the same code. Referential constraint information checking is still serialized.
With read-write locks, the same code reutilization problem exists as with plain locks. Contrary to expectations, a referential constraint checking operation in this example cannot be easily parallelized. Assume for a moment that the checking itself is done using shared locks. When the referenced table dictionary entry is found to be stale, meaning that the referential constraint needs to be reloaded, the lock will then have to be promoted to exclusive by releasing it and reacquiring it in exclusive mode. However once the lock has been promoted, there is no guarantee that the structure being checked has not changed, since other threads could have acquired the lock in exclusive mode and modified the structure during our wait for the lock promotion.
This means that after having promoted the lock, the shared resource must be checked again for changes and restart possibly from the beginning of the outer wrapper function. This may not seem like a major drawback, but when many threads are checking the structure concurrently and all find it out of date, all will have to promote the lock and recheck the structure with the lock held in exclusive mode, which means that any change detected by more than one thread at the same time will result in many serialized rechecks, leading to a substantial bottleneck on the lock. Given the complexity of this operation, many times an easier choice is to just acquire the lock in exclusive mode from the beginning, again giving up on parallelism.
The most desirable option would, of course, be a lock that allows both code reutilization and parallel operation during structure validation.
Java class “ReentrantReadWriteLock”, the details of which can be found at http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/locks/.
ReentrantReadWriteLock.html discloses a simple implementation of a read write lock that can be locked multiple times and that can be promoted from read to write. It does not allow multiple readers to check a structure in parallel, and on detecting a change, avoid a thundering herd of lock promoters all trying to amend the structure in the same way and at the same time. A reader of the lock can only upgrade the lock to write mode if it is the only reader around, which can lead to stalls.