Making investigative decisions, especially those that have the potentially to impact lives and communities, requires access to up-to-date and accurate investigative information. Unfortunately, investigative information is often spread across multiple databases, computers, geographies, and clearance levels. For investigative organizations such as intelligence, defense, and law enforcement organizations to be successful, they need ways to share and find information quickly so that critical decisions can be made in time for them to have impact.
One possible solution for sharing investigative data between investigative teams is to use a multimaster database system. In a multimaster database system, investigative data is stored in a group of databases which may be geographically distributed and interconnected by one or more data networks. Data changes may be made to any database of the group. Data changes made to one database are propagated over a data network by a software process to the rest of the group. Multimaster database systems typically employ either a “synchronous” or an “asynchronous” replication scheme for propagating database changes.
In synchronous multimaster replication, each change is applied to all databases in the group immediately or to none of the databases if one or more of the databases in the group cannot accept the change. For example, one of the databases may be offline or unavailable.
In contrast, in asynchronous multimaster replication, a change made to a database is immediately accepted by the database but propagation of the change to other databases in the group may be deferred. Because propagation of changes may be deferred, if a database in the group is unavailable, the available databases can still accept changes, queuing the changes locally until they can be propagated. For this reason, multimaster database systems employing an asynchronous replication strategy are considered to be more highly available than multimaster database systems employing a synchronous replication strategy. However, since asynchronous replication raises the possibility of “concurrency conflicts” that occur as a result of concurrent database changes to multiple database databases of the group, multimaster database systems employing an asynchronous replication scheme are generally considered to be more complex to design, maintain and operate than those employing a synchronous replication scheme. Despite the extra complexity, asynchronous replication is often preferred in the investigative analysis context where investigative analysis teams can be dispersed throughout the world and connected to one another by unreliable network connectivity. Using an asynchronous replication scheme allows an investigative team to update investigative data in their local database even if network connectivity is not currently available. When network connectivity becomes available, the team can share their updates with other teams and receive the other teams' updates made in the interim.
A concurrency conflict can occur in a multimaster system employing an asynchronous replication scheme when the same data is changed in two databases before either one of those data changes can be propagated to the other. For example, assume that at database A, data representing a particular person's eye color is changed to “brown”, and after that data change but before that data change can be propagated to database B, data at database B representing the same particular person's eye color is changed to “green”. Without additional information, it is unclear which data change is the “correct” change that should be adopted by database A and database B.
Typically, a multimaster system employing an asynchronous replication scheme provides a mechanism for “deconflicting” concurrency conflicts. In many cases, deconflicting a concurrency conflict involves detecting and resolving the concurrency conflict such that the resolution of concurrency conflict is adopted at all databases in the group. In some cases, the multimaster system may be able to deconflict a concurrency conflict automatically without requiring user intervention. In other cases, user intervention is required to decide which of the concurrent data changes should be adopted as the “correct” data change.
One possible approach for detecting concurrency conflicts in a multimaster system employing asynchronous replication is through the use of version vectors (sometimes referred to as vector clocks). A version vector is a mechanism for ordering changes to database data that works by tracking “causality” relationships between changes. In particular, version vectors allow the system to determine if one change “happened before”, “happened after”, or “happened concurrently with” another change, even if the two changes were made to different databases at different times. Further information on using version vectors to track causality relationships between database changes in a multimaster database system is available on the Internet at wiki/Version_vector in the en.wikipedia.org domain, the entire contents of which is hereby incorporated by reference.
“Revisioning” adds an additional layer of complexity to multimaster asynchronous replication on top of the complexity of detecting concurrency conflicts. In particular, “revisioning” databases in the replication group may each maintain an online history of database changes. Maintaining a historical record of changes as opposed to just the latest changes is useful in the investigative analysis context because it allows investigators to determine “what was known when”, where the “when” can be a point in time in the past. For example, a revisioning database may store two change records CR1 and CR2 for a suspect of a criminal investigation where initially it was thought the suspect is residing in Los Angeles, Calif., USA as indicated by change record CR1 but it is now thought that the suspect resides in Sacramento, Calif., USA as indicated by change record CR2. When replicating a change to a revisioning database to another revisioning database, it may be desirable that the history of changes exist in both databases after the replication has occurred. For example, if the “current possible location” property of the criminal suspect is changed in revisioning database D1 from “Los Angeles, Calif., USA” to “Sacramento, Calif., USA” and that change is replicated to revisioning database D2, it may be desirable that the change records C1 and C2 CR1 and CR2 for the suspect in revisioning database D2 indicate that the prior value for the property was “Los Angeles, Calif., USA” and the current value for the property is “Sacramento, Calif., USA” respectively.
Access control adds yet another layer of complexity to multimaster asynchronous replication. In particular, change records in a revisioning database can be associated with an access control list that governs access to the change records. Such access may include reading the change records. For example, an access control list ACL1 associated with the change records CR1 and CR2 for the criminal suspect in revisioning database D1 may specify that both user Alice and user Bob currently have read access to the change records C1 and C2 CR1 and CR2. Thus, both Alice and Bob can determine from the change records C1 and C2 CR1 and CR2 in database D1 that the prior value for the “current possible location” property was “Los Angeles, Calif., USA” and the current value for the property is “Sacramento, Calif., USA”. If an access control list associated with a set of change records is changed in one database and that change is replicated to another database, it may be desirable for security purposes that the access control list resulting from the change apply to all change records, including historical ones, in the other database. For example, if the access control list ACL1 is changed in database D1 to remove Bob and that change is replicated to database D2, it may be desirable, for security purposes, that after the change is applied to database D2, user Bob can no longer read change records C1 or C2 CR1 or CR2 in database D2.