The ability to verify the authenticity of digital data in the electronic age has become more challenging at the same time as it has become more needed. Documents (defined broadly as any body of digitized information) in electronic form are everywhere in modern banking, commerce, government, law, indeed, in modern life in general. In a world where documents are created, submitted, processed, stored, considered, etc., all electronically, sometimes even in multiple locations in the “cloud” unknown to the users themselves, notary or other official seals, physical signatures, special papers and other such tools are becoming increasingly unsuitable and unreliable.
Perhaps the most common way at present to verify the authenticity of electronic documents is to use some form of digital certificate to “sign” them, which is typically accomplished using some form of asymmetric cryptography. Public key cryptography is fast enough to enable almost instantaneous certificate generation. However, there is an inherent weakness in using asymmetric cryptography to create digital signatures: Cryptographic signature keys may become compromised. Once a key has become compromised, the certificates created with that key are no longer verifiable. Since the likelihood that a key will become compromised increases over time, certificates created by using keyed cryptography are useful only for a short term.
Key-based systems suffer from other disadvantages as well. For one thing, it becomes necessary to keep track of sometimes very large sets of keys and whether they are still valid.
Many common systems treat each digital record as a free-standing entity unrelated to any other—keys are generated for each record, and security depends on that key set. Nothing that happens to any other record, or at any other time, will reflect in information associated with a given record. Entire systems can therefore be compromised without an individual user being aware of it.
Some other systems increase verifiability by creating a data structure in which information from more than one record at a time is used to compute a composite, higher-level value that can be used to help detect unauthorized changes to any of the records. For example, a tree structure of hash values (for example, a Merkle tree structure) of digital input records can create a single, highest level verification value such that even the smallest change to one input record will yield a different highest-level value upon recomputation and reveal that a change has occurred.
When it comes to verifying the authenticity of digital documents, regardless of whether the user cares about proof of receipt order or not, most existing methods have the serious flaw that users must in some way trust some service provider at some point. In other words, even with a theoretically trustworthy verification scheme, one must then instead trust the entity that performs the verification. Trust in such systems is sometimes unwarranted, but is always a reason at least for concern. In 2007, for example, it was observed that the BSAFE cryptographic library of RSA Security (a major provider of cryptographic technologies) used as a default the DUAL_EC_DRBG random number generator, which included a “back door” that resulted from use of a set of initiating numbers supplied to RSA by the U.S. National Security Agency. Even with the best keys, therefore, one must still wonder about the trustworthiness of the keymaker.
One alternative to total reliance on keys includes publishing a digital record along with some verifying information. This may avoid the need for such trust, but a pure publication-verification scheme is unsuitable for large collections of documents that each may need authentication. In other words, one or both of two common problems beset known authentication schemes: either there must be some “trust authority” or the systems are not amenable to extensive scalability.
Guardtime AS of Tallinn, Estonia, provides a distributed, hash tree-based data-verification system that does not rely on keys at all, is highly scalable, and that, in the most developed embodiment, avoids the need for trust even in the Guardtime system itself—verification of a given data set may be carried out independently, relying only on mathematical procedures open to all.
FIGS. 1-5 illustrate the general infrastructure of the distributed hash tree infrastructure used to validate data in a system such as the one provided by Guardtime, including an embodiment that includes publication for second, higher-level verification. Sets of digital data (“documents” or “digital records”) are input by a set of lowest-level systems that comprise clients 2000. The digital data is transformed using a hash function, which provides a hash output value. This client-level hash output value forms a low-level node in a global hash tree and is, accordingly, then hashed in successively higher levels with other nodes in gateways 3000, which in turn hash their various nodes in respective internal hash tree structures to form inputs to aggregators 4000. The aggregators in turn hash their input nodes in a tree structure to form highest level aggregator hash values, which are submitted as inputs to input nodes of a hash tree structure in a core system 5000, which in turn hashes at least the current set of input node values together to form a root hash value that forms a current value ti in a calendar 6000. For each client-level input data set, a unique digital signature is returned that encodes, among other optional data, the hash values of “sibling nodes” in the global hash tree, as well as the corresponding calendar value, such that, given the data set and the signature, one can recompute one's way upward through the global hash tree. If the resulting uppermost computation matches the calendar value, there is an exceptionally high level of certainty that the data set is identical to the one that led to the digital signature.
The operation of the system shown in FIGS. 1-5 is described in greater detail below, but is summarized here so as to illustrate a potential weakness: If the verification infrastructure is implemented such that a single core 5000 processes all requests for signatures, then this core represents a single point of failure with respect to the entire signature and verification infrastructure. In other words, if, for some reason, for example, the server that forms the core, or the network connection to the core, were to fail, then no clients would be able to digitally sign input records. At a lower level, if one of the aggregators were to fail, or its network connection to the core were to fail or become too slow, then no client whose requests feed to that aggregator would be able to get digital signatures at all, or within a current period. What is needed is therefore a way to overcome the concern about single-point-of-failure, while still providing well-determined data signatures that can be used for later data validation.