1. Field of the Invention
This invention relates to a method, system and computer program product for controlling access to data in a distributed environment.
2. Description of the Related Art
Security of information is a problem that is an issue for many corporate organizations, whether small or large. A number of conventional mechanisms exist to ensure various degrees of security for corporate data. The most common mechanism involves server-based systems, such as server-based document systems, where access rights are defined for each document (or for groups of documents), and for each user (or groups of users). There are many examples of such centralized, server-based document management systems, which are readily commercially available from a number of vendors.
One problem with such server-based security systems is the fact that an administrator normally has the highest privilege level in the system, and therefore can gain access to any data on the corporate network. As such, the security, and the level of trust in the system, is only as good as the level of trust in the person of the administrator himself or herself.
Another disadvantage of such server-based systems is that many corporate networks consist of not only servers and terminals (such as, for example, desktop computers that function as terminals), but also laptop and notebook computers, which many users take with them “on the road.” It is frequent practice for a user to download information, in the form of files or folders, to his laptop computer, after which point, security of that information is only as good as the security of the laptop itself.
Another conventional security mechanism involves cryptography, for example, the use of passwords (and encryption keys derived from the passwords), for protection of particular files or other object that contain data that needs to be protected. However, in conventional systems, the problem of the “omnipotent” administrator remains—usually some entity, such as a system administrator, is in charge of the security aspect as well, and therefore has access to the password files, or at least access to data not just in encrypted form, but also in unencrypted form. Therefore, the use of encryption in conventional systems does not automatically ensure that information is safe from the administrator. This, in turn, necessitates having a very high degree of trust in the administrator, particularly given the fact that this individual presumably has access to all the sensitive and confidential data that exist on the corporate network.
By definition, confidential information is information, access to which is restricted to a particular group of people. Restricting access to resources, such as that provided by operating systems, is generally insufficient for controlling access to confidential information, because such operating system-based access control generally requires an administrator to manage the access control. As such, the administrator becomes one individual who has access to all the confidential information. The problem is essentially independent of which operating system is used—all that is necessary is for the malicious user to have administrative rights in order to gain access to the information on a removable storage medium.
Another problem with conventional mechanisms for data security lies in the fact that the object that is protected, such as a file or a folder that contains files, can be copied on to some medium, such as a flash drive, and then an attempt to access the medium can be made. Many such conventional mechanisms are vulnerable to a malicious user who acquires the system level of privileges on some machine, and then attempt to gain access to encrypted data that is stored on some removable medium.
Another problem with conventional encryption mechanisms is the fact that for many users, remembering more than a small handful of passwords (and/or keys) is impractical. As such, many users use the same one or two passwords for many of their activities. Thus, a malicious hacker or a malicious administrator, after learning one password for one file or object, can often gain access to a great deal of other data.
Another problem with conventional methods is the fact that working with encrypted information is generally very inconvenient for users—before working with a encrypted object, such as a file or a folder, that object needs to be decrypted, and once the user is done working with the information, the object has to be re-encrypted, and any unencrypted copies need to be destroyed. These operations, particularly if performed frequently, often leads to mistakes and errors, for example, users forgetting to erase temporary files, users writing down their passwords, users forgetting to encrypt the final document, and so on. As the volume of such actions increases, particularly for document intensive operations, guarding against such mistakes and errors becomes merely impossible.
Yet another problem is that equipment failure, particularly when the encryption process has not been completed, can leave unencrypted data available to a malicious user, typically in some form of nonvolatile storage. Also, in some cases, given equipment failure, it is at times difficult to tell where exactly the encryption operation was interrupted, and therefore difficult to tell which part of the data, if any, did not get encrypted. This, in turn, can lead to a loss of confidential information.
Another problem with conventional systems is that encryption algorithms can be relatively computationally intensive. As such, where large volumes of data, such as large files, or entire disk drives, needs to be encrypted, this can take a relatively long time. It is therefore desirable to give the user an ability to work with the storage device, such as a disk drive, even while the encryption process is going on.
Yet another weakness of conventional encryption systems is that typically conventional block encryption algorithms work with relatively small increments of data, such as 8, 16, or 32 byte increments. On the other hand, most storage devices, such as disk drives, typically work with much larger units of data, such as blocks or sectors that are 512 bytes in length, or 1024 bytes in length, and so forth. If long data streams are broken down into very small portions and encrypted independently, then monitoring the encrypted versions of the data can identify localized spots where data has been changed. This gives some indirect information about contents of the encrypted data. Therefore, it is desirable that a change in one bit somewhere in a protected object's plaintext results in changes in the ciphertext that are not localized.
Another problem with conventional systems is that the size of the file might not be an even multiple of the block size of the encryption algorithm, and the operating system may not be responsible for security of data that is located outside the file boundary.
Another practical problem, if not a theoretical problem, is that confidential information is primarily useful if it is accessible by more than one person. For conventionally encrypted data, this means that everyone with access to that data needs to have the same key. The flip side of this is that if all the confidential information is encrypted using the same key, then any user who has access to any portion of that information therefore has access to all the data, even data that is not intended for his use and his access.
On the other hand, if the confidential data is broken down into multiple subsets of data, each with its own encryption key, which is preferable from the perspective of data security, then a user who needs to have access to all the data will have difficulties resulting from having to keep track of numerous keys and passwords. Typically, such multiplicity of keys and passwords leads to compromising the integrity of the data, due to various “human failings” of most users.
Yet another problem in handling confidential data in a multi-user environment results from one, out of several, users who have access to that encrypted data, losing his access rights. For example, this can happen when an employee leaves the company. In conventional systems, this means that to ensure security, the file needs to be re-encrypted using a different key, and a new key needs to be distributed to all the users who still retain access rights to that data. This is burdensome for everyone involved. It is therefore desirable to handle the situation without direct involvement of the remaining users who have access to the data.
Confidential information is frequently structured. For example, a great number of users might have access to confidential information of a general nature. The more specific the information, the smaller the circle of people who have access to that information. It is therefore desirable for a system that handles information security to be able to address this fact.
Yet another problem with conventional mechanisms is that various applications have different ways of handling protected files. For example, a word processing application might make a copy of the file, can make a copy of a file with a different name, can rename the original file, can create a new file with the same name as the old file, at the same location, and so forth. If the software applications are not integrated in any way with the mechanism for handling access control, then such file operations can lead to a loss of security, particularly where frequent decryptions and encryptions need to occur.
Use of public key infrastructure can lead to a substitution of the public key of the user to a public key of a malicious user. A variation on this form of cryptographic attack is an addition of a fictional user (i.e., another public key) to the access list. The added user does not have actual access to the particular object, but the malicious user hopes for the possibility that during the next re-encryption of the file, or the creation of a new file in a protected folder (or during the editing of an earlier file with creation of a new copy), the new user will gain “legitimate” access to that file. To address this problem, it is necessary to constantly review the list of people with access to the object, looking for fictitious users.
Yet another problem with conventional systems is the use of fictitious stubs. Normally, use of public key infrastructure encryption and hash functions does not require the knowledge of a secret key. This means that a malicious user who knows the data formats that are used, and who selects a key for file encryption, can correctly encrypt that file, provide the file with an access list of real users, and place it into a protected folder. Such a fictitious object will appear no different than any other real objects, and therefore will be initially trusted by other users. Such a fictitious object can be used for disinformation. It can also be used as a Trojan horse, where the file is used as a template by applications that work with files on the system. As a result, the malicious user can gain access to a final document, which contains real confidential data. Therefore, identification of such fictitious objects, which were not created using legitimate mechanisms, is also desirable.
As such, there is need in the art for a more robust system for security of information, particularly in the context of network distributed data storage, and particularly where most of the methodology used to implement the security is substantially transparent to the users.