The inventive concept disclosed herein relates to graph database and in particular to partitioning a graph database.
A graph database is a database that uses graph structures with nodes, edges, and properties to represent data. A node may represent an entity such as a person, a business, an organization, or an account. Each node has one or more properties, or information that relates to the node. For example, if a node represents a person, the properties associated with that node may be the person's gender, age, name, and/or identification number of some kind. A graph database provides index-free adjacency, such that each element contains a direct pointer to its adjacent elements and there is no need to reference an external index.
Graph databases have various applications. For example, a graph database may be used in healthcare management, hospitality, transport, integrated circuit design, computer architecture design, and a social network system, to name a few.
Graph database methods may be used for partitioning graphs to allocate subsets of data to machines cooperating in a cluster. A cluster of machines is used to be able to handle larger datasets, often involving many millions—sometimes even billions—of nodes. However, this need for increased storage space conflicts with efficient query processing, as a query would typically be processed more efficiently within the context of a single machine (a “machine” is a computer/computing device including a processor and a memory). Queries processed across machine boundaries may be orders of magnitude slower than queries that execute in a single machine.
A method and system that allows queries to be processed in an efficient manner even involving a large dataset that spans across multiple machines is desired.