“Big data”, cloud, and the Internet of Things (IoT) are examples of the rapidly expanding area of distributed data networks and acquisition of distributed data. Data generated at a plurality of source nodes is collected for processing and/or analysis. An example of the source nodes includes sensor networks that perform measurements and provide measurement data, e.g., in home automation data networks or industrial processing data networks. A further example includes servers in a data center generating event log records, e.g. for operational security.
The operation of data networks, such as above examples, relies upon the integrity of the data received from the distributed data sources and the control processes. This means that as data is collected, it has to be possible to verify that the data has not been tampered with since the data left the source node. Furthermore, the data source has to be authentic. This means that an indicated source, e.g., a source node indicated by the received data or a data packet including the data, is the actual originator of the data.
Depending on operational security requirements, it is not sufficient that only the intended recipient collecting the data can verify aspects of integrity and authenticity. Rather, it is required that third parties can audit the data exchange between the source nodes and the collecting node. Conventional techniques for authenticating the data source implement public-key cryptography, e.g., using a Public Key Infrastructure (PKI) with signatures on all data exchanged between the nodes.
However, generating signatures is resource consuming in mininialistic source nodes (also referred to as “low-end devices”) such as sensors. Furthermore, the impact of signatures on bandwidth and/or storage is disproportionally large compared to the data to be exchanged (e.g., since the nodes have to be prepared for an audit, a large number of signatures have to be stored for relatively long time periods in the nodes). Moreover, signatures verifiable by a PKI are known to be cumbersome to establish and maintain over time, especially if many sources of data have to be distinguished, i.e., identified by means of different certificates.
Other conventional techniques, e.g. below referred to as QI-KSI, implement Merkle trees. Aggregating hash values of the exchanged data in a Merkle tree is efficient, since the “root” of the Merkle tree provides a compressed digest of all individual hash values, so that the Merkle tree reduces storage requirements. However considerable effort is needed to arrange for the keys in each leaf of the tree to be used for authentication.
Ahto Buldas, Andres Kroonmaa and Risto Laanoja have disclosed some principles in “Keyless Signatures' Infrastructure: How to Build Global Distributed Hash-Trees”, below referred to as [1], in “Efficient Quantum-Immune Keyless Signatures with Identity”, below referred to as [2], in “Efficient Implementation of Keyless Signatures with Hash Sequence Authentication”, below referred to as [3], and in “Security Proofs for the BLT Signature Scheme”, below referred to as [4]. Ahto Buldas and Sven Laur have disclosed some principles in “Knowledge-Binding Commitments with Applications in Time-Stamping”, below referred to as [5].
Every time the client wants to authenticate himself, a value zk needs to be recomputed from zn, as will be further described in this disclosure. This may be a problem if n is large and there is no capacity to store or re-compute the whole hash chain. The solution to this problem is the technique called “hash sequence traversal”. One such technique was proposed by D. Coppersmith and M. Jakobsson in their paper “Almost Optimal Hash Sequence Transversal”, below referred to as [6]. In order to derive zk faster than just sequential hashing from zn to zk, the reversed order of hash chain z0←z1← . . . ←zk← . . . can be derived in average log(n) time if one could keep log(n) of intermediate hash values of the hash sequence.
A short description of the M. Jakobsson and D. Coppersmith technique on the intuitive level can be given as follows. Assume the client can keep the value zn/2, then the derivation of any value zk would require at most n/2 hashes, instead of n. Now let us assume that the client keeps two intermediate values zn/2 and zn1/4. Thus, the elements of the first half of the hash chain zk, for k≤n/2, would require re-computation of at most n/4 hashes. When k becomes larger than n/2, the intermediate value zn1/4 can be removed and a new value zn3/4 is derived linearly in time n1/4 hash operations, so that the elements of the second half of the hash chain zk, for k>n/2, can be calculated in at most n/4 hashes as well. It has been shown that having log(n) intermediate hash values, the total time to derive the reverse-order hash chain is log(n), in average.
From the discussion above, calculations and operations may become demanding. It is therefore a desire to avoid doing calculations and other operations when not necessary to improve efficiency.