1. Field of the Invention
The present invention relates to computer security, and deals more particularly with a technique, system, and computer program for protecting data stored by a computer, by incorporating localized information into encryption and decryption algorithms used when storing and accessing the data.
2. Description of the Related Art
In an environment where computers are connected through insecure public networks, there is a rising concern for the security of private data stored on the computers. Possible into individual computers through various "hacking" techniques makes each machine vulnerable to attacks, thereby exposing the stored confidential data.
The Internet is a well-known example of a public network having these attributes. Users attach computers to the Internet through the services of an Internet Service Provider (ISP). An ISP provides computer users the ability to dial a telephone number using their computer modem, thereby establishing a connection to a remote computer owned or managed by the ISP. This remote computer then makes services available to the user's computer. Typical services include: providing a search facility to search throughout the interconnected computers of the Internet for items of interest to the user; a browse capability, for displaying information located with the search facility; and an electronic mail facility, with which the user can send and receive mail messages from other computer users.
After having connected into the public network in this way, a user may find that data on his machine has been compromised. Or, even more alarmingly, data may have been comprised without the user ever suspecting a problem. There are various ways in which a hacker may access the data on the user's machine. One way is through what is known as a "Trojan horse". This term is used to describe software that masquerades as a useful or interesting application, but that will, if loaded onto the user's computer, perform some type of destructive function in addition to, or instead of, the function the user expects. The user may locate this type of software while using the Internet, and be persuaded to download it, for example by text which describes the software as providing some desirable function for free. Once the software has been downloaded to the user's machine, any conceivable type of destructive action is possible: the Trojan horse may be designed to erase files; to write over existing files with information supplied by the Trojan horse; to locate particular files and forward them out into the Internet, to an address the intruder (i.e. the Trojan horse creator) can receive them; to monitor the user's keystrokes, and forward those to the intruder through the Internet; and so forth.
The Internet is used by millions of people around the world on a daily basis. Many of these users are business users, performing business functions. Others are individuals, using the Internet for personal reasons. Businesses that make Internet access available to their employees may have computer security programs in place to minimize the likelihood of intrusion by an outsider. Many individuals, however, will be accessing the Internet from their home computer. Very few of them are likely to appreciate the possibility of destructive access to their machine, and thus will have little or no security mechanisms protecting their data. Hackers have developed software that can find machines which are vulnerable to specific types of attack, by accessing the machines (with a Trojan horse as described above, for example) and then checking for particular security mechanisms. If a vulnerable machine is found, its address and type of security deficiency is forwarded to the hacker by the intruding software.
With the growing popularity of electronic commerce on the Internet, one can begin to better understand the importance of preventing these electronic intruders. Destructive functions are no longer limited to such things as destroying data stored on the user's computer (which will perhaps take many hours to recreate), but may have a very real financial impact as well. Electronic commerce includes on-line shopping, on-line bill paying, inquiry into account information, etc. To participate in electronic commerce, the user will often provide confidential information such as his credit card information or bank checking account number. This information will then be forwarded out into the Internet, to the remote computer providing the electronic commerce service. Depending on the sophistication of the electronic commerce software installed on the user's computer and on the remote computer, this confidential information may or may not be encrypted during transmission. Credit card numbers and bank account numbers tend to have a short, formatted structure. When this type of data is sent through the Internet unencrypted, it can be easily recognized by hacker-written software that monitors Internet transmissions. The hacker then has all the information he needs to use the account as his own. However, encrypting this type of confidential information during transmission does not completely protect it from being electronically stolen: as discussed above, Trojan horse software may be introduced to the user's computer which will be just as dangerous. The confidential information may have been stored on the user's computer by the electronic commerce application software. Or, for that matter, the user may have financial software packages installed on his machine which never interact with other computers in the Internet, but which store data such as account numbers for use with that package. It is an easy task for a hacker to determine the file naming conventions used by electronic commerce or financial software applications. Once the Trojan horse is downloaded to the user's computer, it simply searches for known file names, and forwards a copy of the files it finds out into the Internet, to the waiting hacker. Or, it may search through all the computer's stored files for data having the attributes of credit card numbers or account numbers, and forward those files.
If the user's confidential data was encrypted before it was stored on the user's computer, however, then it is of no use to the hacker--unless he can decrypt it. Successful decryption requires the hacker to discover the particular encryption algorithm that was used, in order to reverse the transformations the algorithm made on the data. A number of types of encryption algorithms, and corresponding decryption algorithms, are in common use today. The combination of an encryption algorithm and corresponding decryption algorithm is known as a "cipher". One popular category of ciphers is known as "private key" ciphers. With this type of cipher, the functionality of the encrypting and decrypting algorithms is publicly known. The Data Encryption Standard ("DES") is an example of this type of publicly-known cipher. In order to protect data by encrypting it with a known encryption algorithm, the algorithm requires use of a secret, or private, key. A key is a value entered by a user of the algorithm, and factored into operation of the algorithm in such a way that the resulting encrypted data is dependent upon the key value. The decryption algorithm requires use of a secret key as well. When the same secret key is used for both the encryption and decryption algorithms, the cipher is referred to as "symmetric". DES is a symmetric cipher. If someone knows the key used during encryption, and has a copy of the encrypted information, that person will be able to decrypt the information.
If the transformations performed in a cipher were simple, it would be relatively easy to guess what had been done, and then undo it. Or, a software application can be written that performs what is known as a "brute-force attack", whereby the software systematically tries to discover the transformations. Because of this, ciphers tend to be comprised of very complex mathematical transformations. The more complex they are, the more difficult it will be for a brute-force attack to succeed. As a result, the number of ciphers in use is fairly small. Thus, even if a hacker does not know which cipher was used when encrypting data, there are a small number of possibilities to try in the brute-force attack.
Another type of attack that is common when trying guess the transformations performed by a cipher is known as a "dictionary attack". In this type of attack, a hacker uses a stored file of textual information (for example, a file storing the text of an on-line book), referred to as the dictionary, and systematically supplies each text string in that file to the cipher as the key for decrypting an encrypted file. If readable text results from any of these decryption attempts, then it can be assumed that this text string is a valid key. When a user's ID and password have been used as the key, then it is very likely that there is some file on their computer system where that information is stored. If a hacker locates this file, and uses it in a dictionary attack, then he will have gained access to all stored data that was encrypted using that ID and password. Computer users have a tendency to use common words or names for their user ID and/or password, thereby increasing the chance that a dictionary attack, using a dictionary that contains many common words, will succeed.
Software applications that use confidential data often have mechanisms to limit application usage to authorized users. These applications require the user to identify himself, and then use this identification to determine whether the user is authorized. In order to prevent an unauthorized person from gaining access by simply guessing the identification used by someone who is authorized, the applications also typically require the user to enter a password. The application will have stored the identification and password for each authorized user during a configuration step, and compares the information entered by a user requesting access to the stored information: if there is a match, then this user is authorized to use the software, and is given access. But this type of authorization checking presents another vulnerability to intrusion: a Trojan horse may be downloaded that searches for the file in which the authorized user information is stored. Knowing this information allows a hacker to electronically impersonate an authorized user, and gain access to supposedly secure applications and data.
Encrypting the files where identification and password information is stored prevents a hacker from using one of the valid identification-password combinations, unless the hacker can decrypt the file as discussed above.
When a file is to be encrypted before storing it, a common technique is to use the identification-password combination as the secret key for the encryption algorithm. If the user keeps his password (and optionally the character string he uses for his identification) secret, and uses an encryption algorithm that provides a strong degree of protection, then it will be fairly difficult to decrypt the data using a brute-force attack. However, a security exposure exists due to the fact that the user must type his identification and password on the keyboard to identify himself when he begins using the application software. If a keystroke-monitoring Trojan horse is running on the user's machine, the hacker will be sent a file from which he can deduce what the name of the particular application was, the identification of an authorized user of that application, and the user's password. All that remains unknown for decrypting the encrypted information stored by this application is the particular manipulations of the user identification and password that were performed by the encryption algorithm. If the hacker has access to a copy of the application software, then he does not even need to know what those manipulations were: he simply runs the application, supplies the user's identification and password to it, and the application will perform the decrypting manipulations for the hacker as if he were the authorized user. Even encrypting the user's identification and password will not protect the user's data in this scenario: the hacker has captured the necessary information before it was encrypted. (The term "ID" or "user ID" is used herein as an abbreviation of "user identification".)
Accordingly, a need exists for a technique by which stored information can be protected even in the presence of keystroke monitoring. The proposed technique provides an additional secret value to be used for computing the secret encryption key, where this value is not typed in by the user. The secret value used by the proposed technique is an immutable characteristic of the user's computer, so that a hacker would need to have access to the user's computer in order to decrypt the stored information. Further, a need exists for a system and method by which this information can be protected regardless of whether the information is stored on the computer on which it was originally created, enabling the user to move his encrypted files to a different computer. Optionally, the secret value can be exposed to the user on request, so that he may decrypt the data if the secret value is not otherwise available.