Encryption of information is normally undertaken to ensure privacy, that is, so that no one other than the intended recipient can decipher the information. Encryption is also undertaken to ensure the authenticity of the information, that is, that a message which purports to originate with a particular source actually does originate with that source and has not been tampered with.
"Encrypting" or "enciphering" a message means to scramble it in a way which renders it unreadable to anyone except the intended recipient. In one form, a cryptographic "key" is utilized to encrypt the message and the same key is required to transform it from encrypted form back to plain text by deciphering or decrypting it. An encryption system which operates in this way is known as a "single-key" encryption system. In such a system, the key must be available to both the sender and the receiver. If an unauthorized person has access to the key, then they can decrypt the encoded message and the object of privacy is defeated. The most obvious drawback of single key encryption systems is that it is not often convenient to provide the sender and the receiver with keys. They may be located far apart. A key can be transmitted across a secure channel from the sender to the receiver, but if a secure channel is available, there is no need for encryption.
In a public key encryption system each participant has two related keys. A public key which is publicly available and a related private key or secret key which is not. The public and private keys are duals of each other in the sense that material encrypted with the public key can only be decrypted using the private key. Material encrypted with the private key, on the other hand, can be decrypted only using the public key. The keys utilized in public key encryption systems are such that information about the public key does not help deduce the corresponding private key. The public key can be published and widely disseminated across a communications network or otherwise and material can be sent in privacy to a recipient by encoding the material with the recipient's public key. Only the recipient can decrypt material encrypted with the recipient's public key. Not even the originator who does the encryption using the recipient's public key is able to decrypt that which he himself has encrypted.
Message authentication can also be achieved utilizing encryption systems. In a single key system, a sender, by encrypting a message with a key known only to authorized persons, tells the recipient that the message came from an authorized source.
In a public key encryption system, if the sender encrypts information using the sender's secret key, all recipients will be able to decipher the information using the sender's public key, which is available to all. The recipients can be assured that the information originated with the sender, because the public key will only decrypt material encoded with the sender's secret key. Since presumably, only the sender has the secret key, the sender cannot later disavow that he sent the information.
The use of encryption techniques provides a basis for creating electronic signatures to documents which are even less subject to forgery than handwritten signatures. There are two ways in which encryption can be utilized to "sign" a document. The first method is by encrypting the entire document using the signer's secret key. The document can be read by anyone with the signer's public key and, since the signer alone possesses his secret key, the encrypted document surely originated with the signer.
Encryption of large documents requires considerable computational resources and, to speed up the signing process, a message digest may be used. A message digest of the document is analogous to a cyclic redundancy code (CRC) check sum attached to the end of a packet. The information in the body of the packet is processed mathematically to produce a unique check sum which is appended to the end of the packet. The integrity of the body of the packet is checked at the receiving end by recalculating the check sum based on the received text and seeing if it matches the check sum appended to the packet. If it does, one assumes that the contents of the body of packet is unchanged from that present at the sending end. The same can be done with entire documents.
In modern implementations, a message digest is created using a cryptographically strong one way hash function between the message text and the output digest and the message digest operates like a CRC check sum.
A clear text document may be signed by creating the message digest and then by encrypting the message digest using the signer's secret key. Authentication that the content of the document has not been changed is achieved by computing the same one way hash function of the received text, from the text, and comparing it with the message digest decrypted using the signer's public key. If they agree, one may have a high degree of confidence that the document has been unchanged from the time it was signed, until the present and further, that that which the sender "signed" was the same document.
Public key encryption software is widely available. For example, Pretty Good.TM. Privacy public key encryption software is available for non-commercial use over the Internet in a form published by Phillip Zimmerman. One version, is PGP version 2.6.2 of Oct. 11, 1994. It is available from the Massachusetts Institute of Technology at net-dis.mit.adu, a controlled FTP site that has restrictions and limitations to comply with export control requirements. Software resides in the directory /pub/PGP. A fully licensed version of PGP for commercial use in the U.S.A. and Canada is available through ViaCrypt in Pheonix, Ariz.
Some public key encryption systems utilize a single key encryption of the body of the text with the key changing from session to session and with the key encrypted utilizing the recipient's public key to encrypt the session key so that the encoding and decoding times are quicker.
No data security system is impenetrable. In any data security system, one must question whether the information protected is more valuable to an attacker than the cost of the attack. Public key encryption systems are most vulnerable if the public keys are tampered with.
An example will illustrate the problem. Suppose an originator wishes to send a private message to a recipient. The originator could download the recipient's public key certificate from an electronic bulletin board system and then encrypt the letter to the recipient with that public key and send it to him over an E-mail facility such as Internet. Unfortunately, an interloper has generated a public key of his own with the recipient's user ID attached to it and substituted the phony public key in place of the recipient's real public key. If the originator unwittingly uses the phony key belonging to the interloper instead of to the intended recipient, everything would look normal because the phony key has the recipient's user ID. Now the interloper is in a position to decipher the message intended for the recipient because the interloper has the related secret key. The interloper may even go so far as to reencrypt the deciphered message with the recipient's real public key and send it on to the recipient so that no one suspects any wrongdoing. Worse yet, the interloper can make apparently good signatures from the recipient using the secret key because everyone will believe the phony public key is authentic and will utilize it to check the recipient's signatures.
To prevent this from happening, requires preventing someone from tampering with public keys. If one obtained the recipient's public key directly from the recipient, there is no doubt about the authenticity of the public key. However, where the public key is acquired from a source of uncertain reliability, there may still be a problem. One way to obtain the recipient's public key would be to obtain it from a trusted third party who knows he has a good copy of the recipient's public key. A trusted third party could sign the recipient's public key, utilizing the trusted third party's private key, thus vouching for the integrity of the recipient's public key. However, to be sure that the third party's public key is authentic, requires that the sender have a known good copy of the third party's public key with which to check his signature. A widely trusted third party could specialize in providing a service of vouching for the public keys of other parties. This trusted third party could be regarded as a key server or as a certifying authority. Any public key certificates bearing the certifying authority's signature would be trusted as truly belonging to whom they appear to belong to. Users who desire to participate would need a known good copy of the certifying authority's public key so that the certifying authority's signatures could be verified.
Secure data interchange over public networks is currently being actively investigated and tested in many different applications. Electronic commerce over the Internet is a much publicized application of secure data interchange, and a number of solutions are being proposed. However, widespread transmission of messages, documents, and other information via non-secure networks is being delayed by the difficult task of ensuring the identity of the parties involved in a transaction and the integrity of the messages.
There are five key elements that designers are trying to consider when building applications for secure electronic systems.
Encryption--the encoding of messages for secure communications. PA1 Authentication--the ability to ensure that the originators of a message or transaction are who they claim to be. PA1 Certification--the guarantee, via a trusted third party, that authentications are valid. PA1 Confirmation--obtaining an electronic receipt of a transaction. PA1 Non-repudiation--the establishment of an undeniable means of identifying the parties in a transaction.
Of these five elements, the most difficult to achieve is non-repudiation. In digital signature applications, non-repudiation means that the digital signature must be an unforgeable piece of data that asserts the identity of the person named in the signature. For transactions involving digital signatures to be trusted, it must be possible to easily prove to a third party that the signed parties cannot repudiate the veracity of the signatures, or the content of the signed document.
Associating an Individual Identity with a public/private key pair is the responsibility of the issuing authority. One method of establishing trust in individual identities is to build a "trust hierarchy". In its simplest form, a central issuing authority would be responsible for issuing all certificates and would vouch for the identity of each individual.
In practice, the process must be spread out over several layers. A central authority vouches for the identity of other authorities, which can each then vouch for the identity of other entities within their respective scopes. For example, RSA acts as one top level authority. RSA issues a certificate to Apple Computer, which in turn issues certificates to each of its employees. Thus an individual certificate contains the public key for the individual, as well as the public keys of Apple Computer and RSA.
At the top level, governments may wish to establish methods of certifying each other's identity, which would allow validation of signatures on a global scale.
Seventy-five percent of all forms involve either financial transactions or human resource information, both of which are considered very sensitive. For this reason, securing form data is often considered to be a top priority.
This data must be secured against unauthorized viewing (privacy), and must also be signed electronically as a means of both identifying the author and securing the data against tampering.
Interchange of Secured Data
Exchange Between Platforms--Exchanging encrypted and/or authenticated data between platforms, but within a single application, includes two main issues. First is the availability of the cryptographic tools on the various platforms. Second is the representation of the data within the application.
Cryptographic Tools--This is really just a choice of which cryptographic system is used by the application to provide encryption and authentication services. Ideally, the provider of these tools should be making them available on any relevant platforms, and should be ensuring that the format of the data generated and interpreted by the tools is platform independent.
An example of a good, platform independent tool is Entrust, from NorTel.
Data Representation in Applications--This issue relates to the exact format, down to individual bits and bytes, of the data that is to be encrypted or authenticated. For an application to work in a multiple platform environment, the data format must already have been established in a platform independent manner, so encryption of this data would be simple at the file level. However, encryption may also be needed for selected portions of a document, such as particular records in a database, so this must be accounted for in the structure of the document.
When data is to be signed (authenticated), the stream of bytes that is passed to the cryptographic system for digesting must also be platform independent. This may require the specification of a platform independent format for digesting within the application.
Exchange Between Vendors--Additional issues arise when one considers the interchange of secured data between different representations of that data. Interchange of encrypted data can only proceed if the data can be decrypted and then re-encrypted in the new representation. Digital signatures cannot be preserved at all.
Encryption--In order to change the representation of some encrypted data, the data must be decrypted, translated, and then encrypted again in its new representation. This is a simple process as long as the data is simply encrypted for local storage, but some data may be encrypted for transmission to a particular individual using that individual's public key, so it can only be decrypted using the matching private key.
The obvious solution is to establish the data in the required format before encrypting it for transmission. Thus a distinction must be made between encrypting for local storage, and encrypting for secure transmission.
Digital Signatures--A key component of a digital signature algorithm is its ability to ensure the integrity of the signed data. As described above, this is achieved by creating a digest of the actual data being signed.
Almost every application of digital signatures will digest the data in the same format that it is stored. That is, the data that is digested will be in a form that is proprietary to the application that created it. This creates a problem in exchanging the data with another application, since the data will surely have to be translated to some standard format that is understood by both parties (applications).
One solution to this problem is to agree to a standard format for the data, and to always digest the data in that standard format. Thus, when creating or verifying a signature, the application will have to translate the data to an agreed upon standard, and then digest the data in its standard form.
This solution is weakened by the fact that no agreed upon standard can represent the data with the full richness of every possible application using the standard. Some data will inevitably be lost in the translation, which reduces our confidence that the signature truly represents the data. The reduction of the data allows some forms of tampering to go unnoticed, and the precise extent of possible tampering will be difficult to define.
Another possible solution is to discard the original signature, and instead validate the translated data with a new signature. In this approach, as data is translated to an interchange format, the signatures in the original data set would be verified using the original binary representations of the data. Once verified, the identity of each signer would be incorporated into the translated data as simple text values. A new signature would be applied to the entire translated data set, which vouches for the fact that each signature in the original data set was successfully verified by the individual signing the new data set. This interchange data could then be verified by the receiving individual, and a similar process applied to voice for the integrity of the data after it is translated to the new native format.
This approach uses a succession of verifications and subsequent guarantees, applied by different individuals, to preserve the trust in the identity of the original signers. Although it is less direct, and the original signatures are lost, it is possible to follow the process backwards and understand that at each point, the data's integrity was intact, and to obtain the identity of each individual involved in the process.
The Problems
A burning issue in the area of securing form data is the authentication of a form's layout. The layout of a form defines how the data is represented to the user, and can include many graphical elements as well as fields, tables, and so on. The basic problem with authenticating form layouts is that they change on a regular basis, as forms are revised.
To fully authenticate a set of form data, one must also include information about how that data was (and is) represented to the user. Without knowing how the data was represented, it is impossible to know exactly what the signing individual thought he or she was signing, and there are many possibilities for fraud.
For example, with access to the design tools, a malicious user could modify the layout of a form to change authorized text or to obscure data with graphical elements, or misrepresent the data in other ways. Signatures obtained with a falsified representation of the data could then be verified with the data in its originally intended representation. Therefore, without ensuring the integrity of the layout, nobody can truly trust in what is seen on the screen, or feel comfortable signing a form document.
Thus in the prior art, when a form layout changes, the old data must be retrieved, associated with a new form, and then the new form with the old data signed and stored. This is required even when the changes to the form layout are minor and insignificant in terms of the semantics of the information contained within the form.
When such a process is followed, there is no guarantee that the new layout doesn't change the semantics of the information. For example, a field labeled date.sub.-- account.sub.-- opened with an entry of Jun. 25, 1885 can be changed by changing the label of the field to date.sub.-- of.sub.-- birth and the semantics of "Jun. 25, 1985" will completely change. There is thus a problem of maintaining consistent semantics when changing the layout of data in a form. The prior art does not handle well the exchange of foreign data across platforms and across different applications, especially different applications from different vendors. Further, the prior art does not readily incorporate a form layout which includes graphical elements.
Disclosure of the Invention
The invention overcomes the problems of the prior art by providing methods, apparatus, systems and computer program products for the secure handling of forms and form data.
In general, this is accomplished by separating the data from the form layout and by separately signing the data and form layout. The layout and the data are related in such a way so that if the layout changes, the data file can be applied to a revised layout in a way which guarantees consistent semantics with that which previously existed.
The invention is directed to a method of storing forms by storing at least one form layout, and separately storing form data containing a reference to the form layout. The layout and the form data are preferably signed using encryption of a digest. Signatures to the layout and form data are validated using a public key infrastructure which may or may not be controlled by a trusted third party. The form layout and the form data are linked by a common schema which links field or cell names and data values. One form layout can be replaced by a different form layout, such as a later revision, based on the common schema.
The invention is also directed to a method of retrieving form data by retrieving a form layout, retrieving form data which contains a reference to a version of the form layout, and associating values of data elements of the form data with corresponding data elements of the version of the form layout. Retrieved form data can be edited and the edited values of data elements stored in revised form data. The revised form data together with a reference to a version of the form layout is signed.
The invention is also directed to apparatus for processing form data, including a processor and data storage connected to the processor. The data storage stores at least one form layout and data for filling in at least one instance of the form layout. The processor is configured to verify the authenticity of a signature to one of the form layout and the data before permitting the use of the data with the form layout.
The invention is also directed to a system for processing form data including a server having a database storing at least one form layout and separately storing at least one record containing form data for filling in at least one instance of a form layout, a client process running on a computer, and a network connecting the computer to the server. The client process is configured to request a copy of a record to be associated with a copy of at least one form layout. The client process is configured to verify the authenticity of a signature to one of the form layout and the data. This can be done using the services of a trusted third party.
The invention is also directed to a computer program product, including a storage medium, and a computer program stored on the storage medium for processing form data stored on the medium comprising at least one of a form layout and form data. The form layout comprises (1) layout information, (2) a form number and revision, (3) a layout originator's signature to the form number and revision. The form data comprises (1) data information, (2) a form number and revision, (3) a layout originator's signature to the form number and revision.
Still other objects and advantages of the present invention will become readily apparent to those skilled in the art from the following detailed description, wherein only the preferred embodiment of the invention is shown and described, simply by way of illustration of the best mode contemplated of carrying out the invention. As will be realized, the invention is capable of other and different embodiments, and its several details are capable of modifications in various obvious respects, all without departing from the invention. Accordingly, the drawing and description are to be regarded as illustrative in nature, and not as restrictive.