The present invention relates to linear network coding in a Dynamic Distributed Federated Database.
A federated database is one where a database engine is able to access local and remote sources of data as if the sources of data were contained in one logical database. A distributed database is one in which there are a number of database engines which are interconnected with each other. In a distributed federated database, queries may be performed by any of the interconnected database engines accessing any of the local or remote sources of data as if the sources of data were contained in one logical database. A query from any database engine propagates through the interconnected database engines and result sets from one or more of the sources of data are returned to the querying database engine. In a Dynamic Distributed Federated Database (DDFD) database engines may be dynamically added or removed in an ad hoc fashion whilst the database is in use.
A problem with allowing database engines to be added dynamically is that of providing confidentiality to the exchange of information in DDFDs, without also having to provide a mechanism for assessing the trustworthiness of a node. A further problem is the need to provide complex key management, which is commonly used in many modern confidentiality systems.
One particular problem faced in DDFDs is how to ensure that when data is being passed through the network, it is not readable by an unauthorized party. Data leakage can occur by an attacker owning or compromising a node in the data path or by ‘wire sniffing’ a link between nodes.
FIG. 1 shows a prior art DDFD 100 having four nodes, A 102, B 104, D 106 and Q 108. In the example of FIG. 1, data node Q 108 is asking for some data, typically in the form of an SQL query, the data residing on node D 106. At the time of asking for the data, node Q 108 does not know that the data resides on node D 106. As the query is broadcast, it passes 112 from node Q 108 through 114 node A 102 to node D 106 and also passes 116 from node Q 108 through 118 node B 104 to node D 106. Nodes A 102 and B 104 examine the query and pass it on, adding metadata to the query. When the query arrives at node D 106, node D 106 understands that it wishes to answer the query and return the data to node Q 108.
Node D 106 returns 120 the data to node A 102, which forwards 122 the data to node Q 108. Attacker 110 can intercept the data flowing 120 between node D 106 and node A 102 as well as the data flowing 122 between node A 102 and node Q 108. Data flowing between these nodes is vulnerable to interception. In addition, node A 102 may intercept the data, which may not be desirable if node A 102 is an untrusted node.
In static networks an approach that is used is to encrypt the data, so that only the nodes that have the appropriate keys can read the data. This is a well understood and implemented concept. However, in dynamic networks, where nodes leave and join the network in an ad hoc fashion, the overhead of key distribution and revocation of keys is problematic. It would be desirable to protect data ‘in flight’ through the network without the overhead and complexity of key based encryption.
Known prior art discloses a method in which a data message is secured in a two stage process such that at least a first portion of the plurality of fragments is transported along a first communication path of the network and at least a second portion of the plurality of fragments is transported along a second communication path of the network. Encryption is still used to secure the data and so there are still problems of key distribution and revocation of keys. Although different portions of the plurality of fragments are transported along different communication paths, portions of the different communication paths may use one or more common segments.
Known prior art discloses a secure data parser which splits the data to be secured into two or more portions. Encryption of the data may be done before or after the splitting of the data. Also disclosed is the sending of different portions of the data along different paths thus creating multiple streams of data. Although the data is sent along different paths, portions of the different paths may use one or more common segments.