One way of representing data is with a graph, in which content is represented as a finite set of nodes, with the relationship between nodes being represented as edges (graph edges are sometimes referred to as “links” or “arcs”). An edge represents a relationship between a pair of nodes. For example, assuming two nodes x and y, edge (x,y) is said to point or go from x to y. Nodes can have multiple edges pointing to multiple other nodes in the graph. Edges can be used to represent any type of relationship between nodes (e.g., hierarchical, quantified, etc.). Edges can be assigned values quantifying and/or otherwise delineating relationships between corresponding nodes. For example, numerical edge values can quantify relationships such as cost, distance, time, etc. Symbolic labels can express relationships such as hierarchies, dependencies, orderings, etc.
It is often desirable to share graph data between organizations or departments. Such data is often confidential or sensitive, and it thus important to preserve the privacy of data that is being shared. Furthermore, the levels of trust and the economic relationships between different data sharing entities vary, and thus the desired level of privacy to be preserved when sharing graph data varies as well. The edges of a graph often reveal sensitive, confidential information concerning the relationships between the nodes. The most useful (and most private) aspect of the data can be in the form of the edges. It would be desirable to be able to share graph data in a manner that preserves its privacy, yet still maintains its utility.
Previous efforts to anonymize graph data in this capacity include deletion of node attributes (content) and random edge deletions. Both of these methods drastically decrease the overall utility of the graph data, without clearly preserving privacy. Removing node attributes and sharing the graph topology or structure is vulnerable to privacy breach because if any particular subgraph is discovered with known information, the whole graph can be inferred with all attributes. With random edge deletion, the most sensitive edges can still be revealed. Also, these approaches do not enable sharing of graph data with varied levels of utility and privacy preservation.
It would be desirable to address these issues.