1. Field
This application relates to synchronization mechanisms, specifically to such synchronization mechanisms which are used to prevent concurrent modification of data structures in shared-memory multi-threaded execution environments.
2. Prior Art
For correct operation, programs written for execution in multi-threaded environments must typically ensure that no data structure is simultaneously read and written by different threads. Usually, a fragment of a computer program (also known as a “critical section”) that performs a calculation over the contents of the data structure operates under the assumption that the contents of the data structure do not change while the critical section is executing (these assumptions are made by the programmer as he or she is writing the code). In general, it is not desirable to eliminate these assumptions as it would greatly increase the complexity of the program if it had to correctly account for concurrent modification of the data structure. Therefore, to prevent concurrent modification, it is necessary to “synchronize” the execution of the critical section with all critical sections that may modify the data structure.
A reader-writer lock is a typical synchronization mechanism known in the prior art. Proper use of a reader-writer lock ensures that the execution of a critical section that writes to the data structure does not overlap with the execution of any critical section that accesses (reads from or writes to) the data structure (including the same critical section executing in a different thread). Each time a thread begins execution of a critical section that accesses the data structure, the reader-writer lock must be “acquired”, for the thread to obtain either a read-lock or a write-lock. During lock acquisition, the reader-writer lock implementation is given the opportunity to stall execution of the thread until a “competing” thread releases a lock (if the reader-writer lock determines a conflict).
Towards the end of writing secure systems, it is desirable to separate programs and program code into various “trust domains” and engineer the implementation of each trust domain not to rely on the correct operation of other trust domains, especially those trust domains containing more complex code or those trust domains directly manipulating data that may have originated an untrusted source (such as an Internet connection).
For example, it is sometimes desirable to separate program code into two trust domains of privileged code and unprivileged code (such large subprograms are also known as “modules”). Frequently, unprivileged code runs in a “security context” in which many systems-related operations (such as disk access or read and writing to files) are disallowed in order to minimize the possible harm due to programming errors (i.e., “bugs”) in the unprivileged code. Ideally, the unprivileged code may perform computations only, with no direct access to shared resources through which sensitive data may pass (such resources might include, for example, filesystems and network interfaces). Usually, the unprivileged code must still have indirect access to shared resources; nevertheless, a security benefit exists if it can be enforced that such access is subject to validation by separate “supervisor” code (supervisor code forms a part of the aggregate privileged code).
Ideally, the security of a whole program utilizing the technique of privilege separation is guaranteed as long as the supervisor code validates only those accesses to shared resources that are in accordance with the program's security policy (an addition to the obvious requirement that the remaining privileged code has no exploitable security holes). Software systems that run privileged and unprivileged code in separate security contexts are said to employ “privilege separation”. A notable example of such a system is the GNU/Linux operating system, in which the kernel code is considered privileged and allowed direct access to hardware resources and application code is considered unprivileged and is not allowed direct access to hardware resources. Application code negotiates access to (virtualized) hardware resources through a “privilege barrier”. Privilege separation is of tremendous practical importance because it allows more efficient allocation of developer effort—the high standards required for the development of security-critical software need be applied only to the privileged code. Proper use of the privilege privilege separation technique results in security and productivity benefits.
Concurrent modification of data structures is a large security risk for systems employing privilege barriers, privilege separation or trust domains. Suppose a less-privileged thread writes a username and password into a memory region and submits a request to a more-privileged thread for the unprivileged thread to be granted access to the named user's files. The code for the privileged thread may be written to first check that the password is correct for that username and then add the username to a list of accounts that the unprivileged thread may request file access for in the future. Security is broken if the username is modified at some point between these two operations—the password was verified to match for the first username, but access was actually granted for the second username for which a password was not supplied. The modification might be made by the same unprivileged thread the made the request or by a different unprivileged thread having access to the data structure containing the username. If an attacker can trick the code of an unprivileged thread into making such a concurrent modification (for example, by manipulating data supplied to the unprivileged code through an Internet connection), then the attacker can access the files of any other user without supplying the password.
While the untrusted code for the unprivileged thread should almost certainly be fixed, the whole point of privilege separation is that security should be guaranteed even if such flaws are present in untrusted code. Therefore concurrent modification across a privilege barrier is a problem with the privilege barrier.
The following two solutions to the problem of concurrent modification across a privilege barrier or trust domain are known in the prior art. The first solution, known as “defensive copy”, is for the privileged code to make a copy of the data supplied by the unprivileged code and work with that copy only. The unprivileged code has no access to data structures created privately by the privileged code, so concurrent modification is not possible. The second solution is for the privileged code to suspend all threads that are running unprivileged code that may have access to the data being passed across the privilege barrier. A suspended thread does not execute code while it is suspended and hence cannot modify memory. After the privileged code has finished processing, it resumes the previously suspended threads.
In a memory-safe execution environment, it is ensured that even maliciously-written code cannot defy a number of basic semantics of the execution environment's supported language(s), which are frequently sufficiently strong that privilege barriers can be implemented using the constructs provided by the language itself (as opposed to methodology prior to memory-safe execution environments, in which operating system kernels used processor-specific hardware support to manage security contexts and privilege barriers). The term “memory-safety” refers to the important guarantee that arrays and other data structures cannot be accessed “out of bounds” to reach otherwise unreachable memory. In execution environments without memory safety, any memory location can usually be read or written by any thread (unless page-level access protection is used, but page-level access protection is an extremely cumbersome mechanism to employ). In many memory-safe execution environments, unprivileged (or, at least, less-privileged) security contexts can be created and assigned to threads programatically. In Java, this can be achieved, for example, by using the SecurityManager API. This allows the implementation of application-specific privilege separation mechanisms, with the security and productivity benefits previously described. Note that Java, and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates. Examples are given with reference to the specifics of Java for concreteness, but it is to be understood that similar facilities may be available in other memory-safe execution environments.
However, the concurrent update problem is not completely solved by the use of type-safe and/or memory-safe execution environments because most such environments support full concurrency. Concurrency can easily be disabled (simply do not create any threads), but this also negates performance advantages available on multi-processor machines. In the prior art, the widely recommended solution for the concurrent update problem remains defensive copying. In Java, the copy can be performed in advance, as soon as the final value of a data structure has been calculated. The container for such a “copy made in advance” is called an immutable wrapper class. For example, the String class in Java accepts a character array as one possible way to construct a String object. The constructor of String makes a copy of the character array and stores the copy in a private field.
By not providing any method that can modify the private character array (or, worse, return the reference of the private character array), the String class ensures ensured that the copy cannot be modified by any thread (hence “immutable”). Therefore, privileged code can process String objects generated by unprivileged code without a security risk due to concurrent update. Since this depends on proper behavior of the String class, the String class is said to be “trusted” and forms part of the aggregate privileged code and must be reviewed critically along with all privileged code). It is important to note that copying has not been avoided by using the String class—it has simply been performed at a different point in the execution of the program. On top of the performance degradation due to copying, attempting to apply the model of immutability in general can result in large “bloated” programs (i.e., programs that have an excessive amount of code) due to the requirement of having a mutable and immutable variant of every data structure. An excessive amount of code can result in poor run-time performance as well complicating the software development process.
A commonly-used alternative to copying is locking (which is also termed “synchronization”). As mentioned previously, a reader-writer lock allows simultaneous access to the object by either exactly one writer or any number of readers. In the context of privilege separation mechanisms, a privileged thread can acquire a read-lock on the data structure and be assured that no other thread can modify the data structure until the privileged thread releases its read-lock. However, when this must be implemented as a security mechanism (i.e., a “mandatory” security policy rather than a cooperative strategy for correct multi-threading), the unprivileged code must be prevented from being allowed to modify the data structure without holding a write-lock.
This can be achieved by creating a logical “encapsulation” of the data structure (or “interface” to the data structure) which acts as a proxy for read and write operations on the data structure. Read and write operations can only be effected through the trusted proxy, which acquires and releases the appropriate type of lock for the operation. Such encapsulations are known in the prior art (for example, the “Collections.synchronizedCollection” facility in Java).
Direct combination of an encapsulation with a prior art synchronization mechanism is highly inefficient as locking must be performed around each read or write request. The cost of locking can greatly exceed the cost of a unitary read or write operation. The high locking overhead results in severe performance degradation, which is why, historically, copying has been perceived as the preferable solution. While locking over longer sequences of operations is possible, and has been used extensively in the prior art (for example, a “synchronized” block in the Java programming language), it cannot be used as a security mechanism, since the unprivileged code cannot be trusted to correctly perform locking.
Some synchronization mechanisms known in the prior art, in particular the “recursive mutex” operate in part by storing a thread identifier to identify threads, a technique that is also used in the present invention. However, prior-art synchronization mechanisms also require a memory fence on each acquisition or release of the lock, which is the main cause of the aforementioned “locking” overhead. This overhead precludes direct combination of a data structure encapsulation with a prior-art synchronization mechanism unless performance degradation is acceptable (measurements in contemporary Java environments show an approximately ten-fold reduction in performance).
Traditionally, the overhead of locking has been avoided by locking over longer sequences of operations, and this technique is used extensively in the prior art. For example, a “synchronized” block in the Java programming language acquires and holds a lock while executing the entire contents of an code block, which may effect an unlimited number of unitary modification operations. However, this technique cannot be used in security-critical systems, since the unprivileged code cannot be trusted to correctly perform locking.
In view of the foregoing, there is a need for improved data structure synchronization mechanisms.