As remote access of computer systems and applications grows in popularity, the number and variety of transactions which are accessed remotely over public networks such as the Internet has increased dramatically. This popularity has underlined a need for security in particular:    a. How to ensure that people who are remotely accessing an application are who they claim they are and how to ensure that the transactions being conducted remotely are initiated by legitimate individuals. This subject is referred to as authentication.    b. How to ensure that transaction data has not been altered before being received at an application server. This is referred to as data integrity.    c. How to guarantee that an individual, once having engaged in a transaction, is not in a position to repudiate it. This is referred to as non-repudiation.
In the past, application providers have relied on static passwords to provide the security for remote applications. In the last couple of years it has become evident that static passwords are not sufficient and that more advanced security technology is required.
One solution is to digitally sign data such as electronic documents using an asymmetric digital signing algorithm that is parameterized with the private key of a public-private key pair. This may for example happen using a Public Key Infrastructure (PKI). In a Public Key Infrastructure one associates a public-private key pair with each user. The key pair is associated with a certificate (issued by a trusted Certificate Authority) that binds that public-private key pair to a specific user. By means of asymmetric cryptography this public-private key pair can be used to:    a. authenticate the user,    b. sign transactions, documents, e-mails (so as to prevent repudiation), and    c. set up encrypted communication channels.
In many practical applications a user uses a general purpose computing device (such as a Personal Computer (PC), a tablet or a smartphone) to access an application. The application interacts with the user and at some point the application will present to the user data (e.g., an electronic document such as a contract) that have to be signed. Since the user is a human person, what the user observes is an analog representation of the data being presented. For example the user cannot directly observe an electronic document but can only see an image displayed by the display of a computer whereby that image is supposed to represent the contents of that electronic document. The application on the other hand manages these data typically in a high level abstract digital representation the format of which can depend on the application.
After the user has approved the observed analog representation of the data, the application typically submits the data (in the original application specific high level abstract digital representation) to a cryptographic library on the user's computing device typically through a standard cryptographic application programming interface (API) such as PKCS#11 or MS-CAPI. The cryptographic library then transforms the high level request of the application to provide a digital signature over the data into a series of lower level operations and a digital signature involving the user's private key will be generated.
Stated another way, the data to be signed is presented to the user and observed, reviewed and approved by the user before the data is passed to the cryptographic library to be signed. In other words a high level abstract digital representation of the data to be signed (for example a file in some abstract high level word processing format for representing text and images) is on the one hand transformed through a chain of format transformations into eventually an analog representation (a series of actual images that the user can see or sounds that the user can hear, for example on a display screen) and on the other hand the same high level abstract digital representation is submitted to the cryptographic library to be signed. The process of generating and presenting to the user an analog representation of the high level abstract digital data representation (typically involving a chain of format transformations) on the one hand, and the process of cryptographically signing the same high level abstract digital data representation on the other hand, are done in what are essentially separate independent parallel processes.
When the signature is verified, the signature verification component verifies the signature against the high level abstract digital representation of the data.
This architecture has a number of significant advantages.
On the one hand since the application passes the data to be signed to the actual signing module through a high level standardized API, the application is shielded from the specifics of the signing mechanism. In particular it doesn't need any particular knowledge about the device that performs the actual cryptographic calculations. This in turn means that various, sometimes very different, implementations are possible. For example in some cases all calculations are done in software by the computing device and the private key may be stored in software (e.g., in a file) on the computing device. In another example the private key is stored on a dedicated security device (such as a smart card) that is assumed to be a secure data container and that has certain cryptographic capabilities to do the cryptographic calculations that are central to the creation of a digital signature with that private key.
On the other hand since in this architecture the presentation to the user and the review and approval by the user of the data to be signed happen outside the scope of the cryptographic library, the cryptographic library is independent of this presentation and review process and does not need to have any knowledge of how to present the data to the user. Even more, the cryptographic library is not concerned in any way with the meaning of the data to be signed. The cryptographic library treats the data to be signed simply as an amorphous bit string. This means that any application using any possible digital data format can use the same cryptographic interface, and that new applications and new data formats can easily be added without any impact to the signing system.
In addition, because the digital signature is generated over the high level abstract digital representation, the signature verification component does not require any knowledge about the process to present this high level abstract digital representation to the user. That is, from the application point of view the generation and verification of the data signature is fully independent of the process to present the data to the user and to obtain the user's approval of the data.
Notwithstanding the foregoing advantages, this architecture has however a number of security related disadvantages that are not readily apparent. Indeed, while it can be mathematically proven from the electronic signature that a given digital data set has been electronically signed by the user's private key, it is much less clear whether the user actually observed and approved the data that have been signed and approved the signing of these data. More specifically, the following concerns may rise. In general, the user's computing device is a general purpose computing device with an open software architecture that allows easy updates and upgrades of software present on the computing device and installation of new software not yet present. This open software architecture has the advantage that it allows the user to easily install all kinds of new application software and to upgrade the device to handle new data formats. However, the downside of this flexibility is that such an open architecture inherently makes the computing device vulnerable to attacks whereby an attacker may install malicious software. In practice it may turn out very difficult to ensure that the user's computing device has not been compromised by malicious software such as viruses, Trojans, root kits, etc.
For example, in cases where a pure software solution is being used to implement the cryptographic library, it is very difficult to rule out with sufficient certainty that there has not been any malicious software on the user's personal computing device that has been able to get a copy of the private key which is being stored on the user's computing device in a file. After all, copying a file and stealthily logging any passwords that the user enters to unlock such a file are known to be in reach of many malicious software applications. To remedy this problem, many digital signing solutions use secure devices such as smart card or USB keys to store the private key and to perform the cryptographic calculations involving this private key such that the private key never leaves the secure device and is never present on the user's inherently insecure personal computer. While this already solves a part of the problem, it does not solve all problems.
Since the presentation to the user and the review and approval by the user of the data to be signed happens before the actual signing on the computing device in a way that is independent of the actual signing process, it is very difficult to prove that the data that the user actually reviewed and approved are also the same data that subsequently have been passed to the cryptographic library. Indeed, it is very well conceivable that some malicious software manipulated the software stack and supplanted the data that the user reviewed and approved by fraudulent data before the data is signed. Also, a more subtle issue may cast doubt on what was exactly presented to the user and hence what exactly was approved by the user.
In general the process of presenting the data to be signed to the user involves a chain of transformations of the high level abstract digital representation of the data (in a particular high level data format that may be application dependent, such as for example a particular proprietary text processing format) into an analog representation that the user can perceive and interpret (such as for example the image emitted by a computer display). This chain of transformations is typically done on the user's computing device and may involve many software layers and software components each of which may have various versions and configuration options and which may be different depending on the application and/or the high level data format of the data to be signed.
In addition, the actual hardware capabilities of the user's computing device (e.g., color and resolution capabilities of the display) may vary from one user to another and may even vary in time for the same user. As a consequence, the same high level abstract digital representation may in practice be transformed in a wide range of possible analog representations for different users or for the same user at different times depending on the specific hardware and software set-up of these users' systems at different moments in time. For example, a text document may refer to a certain font which on some systems may be absent and may be ‘translated’ to a different font. Alternatively, the piece of the text in the missing font may simply be left out from the analog representation or the font may have been erroneously replaced by another font. This may cause the user to actually see a different text than was intended. For example the data byte that normally represents ‘$’ (i.e., the dollar currency symbol), may in some character sets represent ‘’ (i.e., the euro currency symbol).
In another example, a document may contain text in one color against a background in another color. It is conceivable that the color handling on some systems is such that the contrast of a piece of text and the background may be so low that the user does not notice the presence of a certain piece of text.
In yet another example, a document may comprise a mixture of text which is coded as a mixture of references to alphanumerical symbols and text in images (e.g., a word or a sentence in a bitmap). For example, this may occur when mixing texts in different languages with different character sets. How this mixture of characters and images is shown to the user may vary depending on the settings and capabilities of the user's computing device. For example, the document may contain images in a format that on some computing devices may not by supported and in some cases these computing devices may not represent these images and the omission of these images may not always be apparent to the user. In some cases this may alter the meaning of the document that the user actually gets to see on the display for review and approval.
In another example a user may be visually impaired and may rely on a text-to-speech convertor to understand the content of a document. In such a case, if the document contains a mix of alphanumerical symbols and pieces of text in images, the text-to-speech converter may just skip the images such that the user only learns about a part of the document that the user is about to sign. A similar example may be documents that are in a format (e.g., using a mark-up language) that uses different tags to indicate different data types or markup information. In some cases interpreters of such mark-up languages may be configured, in order to ensure forward compatibility, to simply ignore unknown tags (that for example may be introduced in more recent versions of the language not yet supported by the version of the interpreter installed in the user's computing device). If certain parts of a text in a document are tagged with such unknown tags, the result may be that the user does not get to see these parts of the text even though these parts may be crucial for the correct interpretation and understanding of the document.
Still another example is the representation of dates. Depending on the configuration of the user's computer a date may be represented as dd/mm/yyyy or rather mm/dd/yyyy (wherein dd is the day, mm the month, and yyyy the year, such that for example Mar. 9, 2013 may be presented to the user as 09/03/2013 or 03/09/2013 and therefore could be confounded with Sep. 3, 2013).
Another problem is that the number of screen views that are needed to let the user see the full contents of a specific document may vary from one system to another or simply may vary on the same system depending on certain preferences of the user. As a consequence it is possible that a user does not review all the document contents because the user is unaware that the document actually contains more content than the user has observed and reviewed.
All these problems have in common that a discrepancy may occur between on the one hand what the user observed (for example what the user in reality saw on the display of a computer) and therefore thinks to have reviewed and approved, and on the other hand what another party claims or may claim has been presented to the user (for example on the display of the computer) and therefore has been reviewed and approved by that user. However, if uncertainty exists about what precisely the user observed when reviewing and approving an analog representation of the data to be signed, then the signature can, despite the mathematically guaranteed link between the signature and the high level representation of the data to be signed, not be considered to offer true non-repudiation.
Solutions exist whereby a secure device is provided that has its own secure user interface that is known to resist attacks. The security device takes care of handling the data to be signed (received by the device in a high level abstract digital representation) both on the one hand for review and approval purposes and on the other hand for signing purposes. That is, in these solutions the review and approval step and the actual signing step are integrated into a single action performed by the security device. This means that the security device must now understand the high level abstract digital format of the data such that it knows how to present the data to the user (i.e., how to transform the high level abstract digital representation into an analog representation to present to the user). Since the secure device signs the high level abstract digital representation of the data that it has received, in order to avoid the problems described above, there must be absolute certainty about exactly which analog representation any given high level abstract digital representation will be transformed into by the security device to present the data to the user. This means that the security device's components for presenting the data to be signed to the user (i.e., software and hardware to transform the high level abstract digital representation into an analog representation and to present this analog representation to the user) must be very carefully designed to ensure that no ambiguity with respect to the end result of this transformation to an analog representation can arise for any possible data set in any high level abstract digital data format supported by the security device. This goal is difficult to achieve and it is even more difficult to prove that it has effectively been achieved for a given security device implementation in a given configuration. However it is not sufficient that it can be demonstrated that for a given security device with a given configuration that there is no ambiguity with respect to the end result of the transformation to an analog representation for any possible data set in any high level abstract digital data format supported by the security device. It must additionally be guaranteed that the security device will comprise at any given time thereafter only such components that are guaranteed not to give rise to any possible ambiguity. This in turn means that adding, removing, replacing, changing or upgrading these components (e.g., to support new high level abstract data formats) must remain under strict control and must be subjected to severe limitations or must be even impossible. In practice to avoid possible ambiguities in the transformation chain and for cost reasons only a limited set of relatively simple high level abstract digital representations are supported. For example the secure device may only support text in a limited character set. As a consequence, in this type of solution a major advantage of the architecture explained above is lost: the possibility to support a wide range of current and future sophisticated data formats that can fully exploit the inherent data representation capabilities of the user's computing device.
What is desired is a solution for signing electronic data which may be in a wide variety of high level application data formats and whereby the solution may easily be upgraded to also support extra high level application data formats and whereby there is no ambiguity with respect to the data that the user actually reviewed and approved.