The invention relates generally to computer systems, and deals more particularly with a technique to determine if changes have been made to data during transmission, either through error or malicious activity.
It is well known today to transmit data across a network such as the Internet or any other, internal or external TCP/IP network. Various protocols such as File Transfer Protocol (“FTP”) and Hyper-Text Transfer Protocol (“HTTP”) can be used for the transmission. Typically, before the data is sent, the sender and receiver establish a communication session. Typically, the data is sent in a single connection, i.e. one or more requests and one or more respective responses through the same socket of both participants. However, in other environments, to speed the data transfer, the data is sent in multiple, asynchronous connections some of which are concurrent with each other. These multiple, asynchronous connections can be in the same or different session as each other and the original session. See allowed U.S. patent application entitled “Internet Backbone Bandwidth Enhancement” Ser. No. 09/644,494 filed Aug. 23, 2000 by Bauman, Escamilla and Miller, which patent application is hereby incorporated by reference as part of the present disclosure. The multiple connection mode requires a multithreaded function which can manage and coordinate the multiple connections in parallel. An IBM Download Director program currently transfers data across multiple connections in parallel using Download Director Protocol (“DDP”). The IBM Download Director program begins operation by defining a session which includes all the connections needed to authenticate the client and server to each other and transfer a file in separate segments. The IBM Download Director program is also capable of resuming a file transfer which has been terminated, so that the transmission is restarted at the point in the transfer where it terminated. The IBM Download Director program uses encryption for the transmitted files.
“Public/private” key encryption such as RSA is also well known. The public key (i.e. publicly known key) is used by the sender to encrypt data, and a private key known only to the recipient is used to decrypt the data which was encrypted with the public key. Thus, for each public key, the recipient has a corresponding private key used to decrypt the communication encrypted with the public key.
Symmetric encryption such as AES is also well known. With symmetric encryption, the same key is used for both encryption and decryption, and is kept secret by both the sender and recipient. Typically, the key is randomly generated by the sender or recipient, and sent to the other ahead of the communication. For security, the symmetric key can be sent encrypted using a public/private key encryption.
Neither FTP nor HTTP provides integrity checking or file protection through encryption. However, encryption has been added to both FTP and HTTP by encapsulation of the FTP files and HTTP files with a known Secure Sockets Layer (“SSL”). SSL is an encryption protocol. The secure FTP (called “FTPS”) is not yet standardized. According to FTPS, integrity checking and file protection are performed by encrypting the file data. The secure HTTP (called “HTTPS”) uses certificates to authenticate the server to the client and can also use certificates to authenticate the client to the server. HTTPS uses public/private key encryption during a handshake phase (which includes the sending of a symmetric key encrypted with a public key). HTTPS guarantees file integrity by symmetric key encryption of the entire data stream and message authentication codes ( “MAC”). The MAC includes a hash of the transfer data, a sequence number, and other descriptors used in the protocol to identify the content and operations such as compression and encryption. The MAC however does not include a file name, file creation data or file size. In HTTPS, there is a hash of each block of data; a file is transmitted as one or more blocks. However, HTTPS does not have a high-performance capability (such as that of IBM Download Director Program) because it cannot manage multiple simultaneous connections. In other words, in HTTPS, all the requests and responses of one session proceed through the same connection.
An existing IBM Lotus Notes program encrypts data during transfer. Lotus Notes uses a S/MIME protocol to send encrypted messages. S/MIME protocol is a mail protocol that includes both a hash value and encrypted data, but does not include a session ID. S/MIME is intended for content delivery and is used as an asynchronous process. The sender identifies the recipient or recipient(s), and data encryption and hash values are created. The delivery can be at that time or at a later time. Transfer of the data is over a single connection and the content is not used in the transfer protocol.
“Hashing” is also well known today. Hashing is a process analogous to parity checking or cyclical redundancy checking where a function is performed on a set of bits or bytes to yield a unique “hash” value. Different algorithms can be used for hashing, such as SHA-1 and MD5. Two identical files will yield the same hash value (if they use the same hashing algorithm), and a difference in hash values indicates a difference between the two files. For example, U.S. Pat. No. 6,393,438 discloses a method and apparatus for identifying differences between two files, such as two versions of a Microsoft Windows registry file. Portions of the file are hashed to yield one four byte value per portion to provide a set of hash results. The set of hash results are combined with a four byte size of the portion of the file from which the hash was generated to produce a signature of each file. If the two files are different versions of a Windows registry file, the hash signatures of the two files will likely be different. It is also well known to hash data before transmission, hash the received data, and compare the two hash values to determine if any changes occurred to the data during transmission.
An object of the present invention is to expeditiously transfer data and reveal any changes that occur to the data in transit.
A more specific object of the present invention is to apply the foregoing technique to data transmitted during multiple connections in the same session.