In systems in which high multiprocessor scalability is desired, such as a main-memory database management system (DBMS), the process execution is divided into a number of threads, each of which can be run on a different processor within the multiprocessor system. These threads access a set of data objects, which form the data and metadata of the system. A single data object is, for example, a single row in the database, or a node in a B-tree used to index the rows.
In the context of a single operation, like reading or updating a row, a thread is either a reader or a writer. A reader only reads a set of data objects, a writer will read a set of data objects and modify (write) at least one of them.
When a writer updates a data object, and either another writer or a reader reads the same data object at the same time, a method for retaining the consistency of the simultaneous operations is needed. A typical method for maintaining consistency is relating a mutual exclusion lock (mutex) on a data object whenever a reader or a writer accesses it. However, this limits concurrent access on a single data object to one reader or writer at a time.
In a more refined method that uses optimistic reads, only the writers mutex the data objects they access, and the readers rely on version numbers to verify their reads on the data objects. When a writer updates a data object, it acquires a mutex on the object, then increases the object's version number by one to an odd number, performs the necessary changes to the data object, increases the data object's version number again to an even number, and releases the mutex. When a reader reads a data object, the reader first determines the version number of the data object, then performs the read, and finally rechecks the version number of the data object. If the version number is still the same as the first time when the data object is first read, and the version number is even, then the read was successful. Otherwise, reading of the data object is attempted again.
This method of optimistic reads allows any number of readers to access a data object simultaneously, increasing the multiprocessor scalability. However, there is one special case of an update operation, namely freeing a data object, which cannot be covered by the aforementioned version numbering.
Traditionally, using a classic memory allocator (malloc), any piece of freed memory may become inaccessible immediately. Using malloc would thus render access to data objects that might be freed by a writer to be unsafe. For example, in a UNIX system, reading a freed piece of memory may result in a segmentation fault.
A first method to ensure consistency is to use reference counts on data objects; whenever a thread accesses a data object, it increases the data object's reference count by one, and when the access finishes, the reference count is again decreased by one. When a data object is freed, the freeing thread waits for the reference count to drop to zero before actually freeing the data object. The problem with this method is that the reference count must be protected, typically by a mutex, which again limits the concurrency of both reads and writes to the data object.
Finally, a second method to ensure consistency is to use a garbage collector. A garbage collector is a process that cleans up unused memory at times. In a system that utilizes a garbage collector, any references to a data object may be left lingering, and after the final reference to a data object is released, the garbage collector will sooner or later free the data object. Disadvantages of garbage collectors are that they typically limit access to the data objects while processing them, introducing a new source of mutexing, thereby limiting the multiprocessor scalability and responsiveness of the system.