Embodiments of the present invention relate to graph databases and, more specifically, to efficiently committing large transactions in a graph database.
Graph databases are drawing increased attention from both industry and academia, mostly due to the proliferation of Online Social Networks (OSNs) and Linked Data, which can both be conveniently represented by graph databases. In a graph database, each entity is represented by a vertex of a graph, and an edge between two vertices represents a relationship between the two entities represented by those vertices. For example, in the case of an OSN, an edge between two vertices may represent a friendship or connection between two people represented by those vertices.
In many cases, there may be a large and continuous stream of data flowing into the graph database, which requires many graph update operations. For example, such an update may be the addition of an edge between two vertices. This update would require a modification of each vertex involved, to add one endpoint of the edge to that vertex. As with many Database Management Systems (DBMS), when adding this new data to the graph, actions may be taken to ensure consistency and reduce invalidation of data.
Some transactions may be large and involve many vertex and edge CRUD operations, which are operations that create, read, update, or delete data. Many operations can be batched together into a large transaction, and the use of large transactions enables the database engine of the graph database to perform optimizations, such as parallelizing some operations. However, large transactions may require locking many different entities, each represented by a vertex, and possibly keeping them in a temporarily inconsistent state for a long time while other operations of the large transaction execute.
As a result of these large transactions, potential inefficiencies can arise due to reduced throughput. The holding of locks for an extended period can cause multiple failures and retries in accessing locked entities, thus further reducing throughput.