In the field of telecommunication, data distribution networks comprising various data generating nodes, such as sensors and devices, are sometimes employed to distribute often huge amounts of data in order to provide knowledge about different locations and environments to parties needing or wanting such sensor generated information. In this context, the term “sensors” is often used to denote any entities capable of registering or measuring some measurable metric or quantity, and of communicating the results, e.g. at regular intervals, by sending source data through the network. “Source data” is thus original data that has basically not been processed.
The source data may for example refer to some physical measure such as temperature or pressure for surveillance of an object or a space, or to some counted metric such as the number of passing cars for example. This source data can then be processed by data processing nodes in a data distribution network when having received the source data, to produce new data derived from the source data, e.g. by performing various calculations and compilations. An illustrative example could be to receive multiple temperature measurements at regular intervals from one or more sensors and then calculating an average temperature for a certain period which is then delivered to a surveillance centre.
It should further be noted that source data in this context is not limited to measurements of “tangible” physical quantities, but could also relate to more abstract information, such as e.g. market or business data, news information, software, media or content such as audio/video/games, etc. For example, in a software development process, data relating to software components may be generated by a plurality of source nodes, which data may then be combined and/or refined, and re-distributed by subsequent data processing nodes. Source data may further be generated and distributed by devices having some operational function. For example, a device may work as an actuator for mechanically operating a moving part such as a door, valve, gate, plunge, ram, etc. In that case, the generated source data may refer to some operational feature of the device, e.g. the number of times it has executed a task.
In the following description, the term “source node” will be used to represent any devices, sensors, detectors, actuators and other entities capable of generating and communicating source data, while a “data processing node” is a node that in some way processes received data, which could comprise source data and/or previously processed data, to generate new data for further distribution through the network. The new data may be dependent on local data as well which has been generated and/or previously stored by the data processing node.
FIG. 1 illustrates how data can propagate through a data distribution network where source nodes denoted “SN” generate and send source data which is received by data processing nodes 102, 104 denoted “PN”. In this example, a data processing node 102 receives source data D from three source nodes 100a. The data processing node 102 then processes the received source data, and possibly also local data L, in order to generate some new data D′ which is thus derived from the received source data D and from local data L if used. The data processing node 102 sends the new data D′ to another data processing node 104 which performs more processing of the received data D′ and possibly also of source data D received from other source nodes 100b, as indicated by dashed lines in the figure, and/or of processed data from other data processing nodes, not shown, and/or of its own local data, depending on configuration. A data processing node 102 may also act as source node itself by generating its own local source data which may be used as well for generating new data.
In this way, the data processing node 104 generates further new data D″ which is thus derived both from the source data D and the previously processed data D′. In this example, the data processing node 104 delivers the resulting data D″ to a “data receiving node” 106 denoted “RN”. The nodes 102, 104 and 106 can thus be seen as direct or indirect users of the original source data D. It should be noted that both data processing nodes 102, 104 can also be regarded as data receiving nodes in this context which term is used to simply indicate that the nodes receive data from one or more preceding nodes. It can be understood that the above-illustrated distribution of data originating from various source nodes may be cascaded in any number of “hops” in a tree-like fashion along a data distribution path which could involve any number of nodes in the data distribution path.
In more detail, a given node in the network, e.g. node 104 in FIG. 1, may be regarded as the root of a topological tree, which tree corresponds to a data distribution path comprising the root and the topologically preceding nodes, i.e. the nodes 100a, 100b, 102 in this simplified example, having taken part in the data generation/distribution steps resulting in the data D″ generated at this root node 104. The data D″ may be further distributed to the node 106 as shown in the figure. The set of source nodes having generated source data can be regarded as the “leaves” of this tree, i.e. nodes 100a and 100b correspond to leaves, but the node 102 does not.
When data is processed and transferred along a distribution path with plural nodes, it may be of interest for any receivers of data to ensure that the received data is really valid and trustworthy and that it has not been manipulated of faked at some point along its distribution path. Today, it is not possible, at least in a simple and efficient way that is practical to implement, to make sure that the received data originates from reliable sources, nor to identify those sources and any processing nodes in-between.
Although various solutions are available for applying authentication and verification of transferred data in a single transfer hop, i.e. from one node to another, based on a trusted relationship between the two nodes, the validity of the data cannot be easily ensured over multiple hops or steps, unless all nodes in the path belong to a trusted “community” where all nodes and data consumers are trustworthy. This model can be quite difficult or even impossible to implement, particularly when a great number of diverse nodes and data consumers are involved in the data distribution network, possibly across multiple different countries. Even if this model is used, where basically all nodes share one or more keys, every node in a data distribution path would have to add their own authentication data to the transferred data to enable tracing of the data, resulting in excessive increase of bandwidth where the total size of the transferred data would grow with every transfer hop. While some end-to-end security solutions are known today for data distribution, these solutions may not be useful and easily applicable since it is necessary to allow intermediate nodes to modify the distributed data, thus breaking the end-to-end trust relation.
As a result, it is a problem that any receivers of data that has been processed and derived from original source data in any number of hops along a distribution path, have no satisfactory and practical way of ensuring authenticity and validity of the received data and/or the original source data, as well as any source nodes and processing nodes in-between, and it can therefore not be trusted that the received processed data is really valid.