The US company Dropbox, for example, has been active in the field of cloud computing since 2008, and since 2011 it has already had 25 million customers, saving 200 million files every day. Dropbox offers its customers online storage space and a file version control system; what is special about this service is that the online storage space can not only be used as an online data backup, but that the files stored in this storage space may be kept in sync on a plurality of desired clients.
Other companies offering cloud services include Amazon, Google, Cisco, Microsoft, and many more. IBM, Microsoft, Oracle, Apple, consider cloud computing as the IT service which is expected to have the highest growth rates in the coming years and is, thus, also promoted accordingly.
In spite of the advantages of cloud computing, potential users (including private users up to global corporations) may be reluctant to use these services. This might, amongst other things, have the following reasons:
Data sovereignty regulations have to be respected—data have to be made accessible to state authorities upon request.
For reasons of storage space optimization by the provider, security and data protection can only be guaranteed to a limited extent.
Response times and bandwidths cannot always be guaranteed, as the provider's servers serve millions of users.
Companies offering online storage space have hundreds of thousands to millions of users who want to use the company's servers for data backups, synchronization, and file sharing.
Many online storage space providers offer their users version control systems, which means that all the different versions of a file are kept at least for a limited period of time and can be recovered by the user if necessary. As provided by law, data histories are often kept without a time limit.
Security plays a decisive role, particularly for critical data, including failure safety and the protection of data from being accessed by unauthorized third persons. In principle, state authorities are permitted to search every customer's stored data. Third parties particularly include those being authorized to access the data (such as state authorities) and unauthorized parties (such as industrial spies or hackers) who may gain access to the data.
Many service providers, thus, use their own keys for encrypting the users' data, respectively using the same key for all their customers, who, however, must not know this key. For this reason, it is required that the data have to be sent to the server in an unencrypted form via an encrypted connection. The same principle applies if the server generates an individual key for each customer or even for each file—if the server generates the key, the server, of course, knows the key and might have to make it accessible to third parties. These third parties may then obtain the data in their unencrypted form. External users who gain unauthorized access to the server, via security holes in the provider's system or by social engineering, etc., will then also be able to decrypt the data of all customers.
Current Implementations—Variant 1:
Variant 1 of the current implementations allows the user to encrypt their data on the client and to communicate them to the server already in encrypted form. Each user individually chooses a user name and a password, which is not communicated to the server. A program is executed on the users' clients (PCs, Mac books,), said program executing the following steps:                The users' data are divided into data blocks;        the program determines which data blocks have already been created and uploaded and which data blocks still have to be uploaded to the server;        data blocks which still have to be uploaded to the server are all encrypted using the same key, which is only known to the user;        a data block list listing the data blocks of the respective data and files is created;        the data block list is also encrypted using the same key known to the customer and at least changes to this list are uploaded to the server.        
All the generated data are encrypted using a key generated based on the user's password and are only then uploaded to the server. For this reason, it is never necessary to communicate a password to the server.
This process can be considered as very safe, as it is based on a zero knowledge approach—the company providing the storage space does not dispose of any useful information concerning the data content on its server.
Another important aspect of this approach that the servers do not have to individually encrypt the unencrypted data blocks of the users and decrypt them again, which reduces the load on the server resources. Catering for millions of users, this allows the service provider to save a lot of resources and, thus, costs.
Compared to the current implementations according to variant 2, this approach has one big disadvantage, however: as the data blocks are encrypted by the individual users on their clients, it is not possible to determine whether data blocks created by a user have already been created by another user.
In summary, this variant offers optimal security and saves computing capacity.
Current Implementations—Variant 2:
The aim of this approach consists in an optimum use of storage space resources for filing data on the server. This is achieved by intelligent compression. A data block of any size (such as 8 kb) is replaced by a 256 bit or a smaller value, assigned unambiguously to one data block. Data blocks which are present on the server can be viewed based on their identification as dictionary-based compression. This means that a short sequence of characters may represent a very long sequence of characters, like in an index. Of course, a minimal overlay is created in the case of blocks which are only used once by a single user. But this overlay is compensated by the advantages, as this variant allows for an additional high-level compression of data blocks, which may already have been compressed.
Providers using the approach based on variant 2 may charge a user for the storage space they use, although they may only have to use significantly less storage space for this user. The version control makes it highly likely to find identical data blocks in the data of the same user, but due to the high amount of data from a high number of users, it becomes also highly likely to find identical data blocks in the data of different users—based on the principle of the birthday attack: any data block of any user may be identical to any data block of any other user. Due to the resulting high degree of compression, non-user-specific data blocks compensate the overlay of many unique data blocks. Compared to the above described variant 1, this process allows a double-digit reduction of the required storage space, which is a huge advantage for a service provider using variant 2.
The disadvantage of this variant is that it is not able to provide as much security as variant 1. In addition, this process requires the data blocks to be sent to the server in an unencrypted form and the server to encrypt these data blocks using a key only known to the service provider. Using this variant, it is not possible for the users to encrypt the data themselves, for they would have to know the general key, which would allow them to decrypt the data of all users if they gained access to the server. Variant 1 is completely different from variant 2, as one and the same data block is encrypted by a first user using the first user's key and by a second user using the second user's key. Unless the two users coincidentally have the same password, it may, thus, be excluded that one and the same data block can be recognized for several users. The server is not able to determine whether two users use the same password; for this reason, the user key should include the user name. It is only possible that two identical encrypted data blocks are two different unencrypted data blocks which become one and the same encrypted data block due to the application of the user keys—as this can be practically excluded, though, it will not be possible to achieve any compression based on identical data blocks.
Aim of the Invention:
The aim of the present invention consists in providing a process according to the second variant which still allows for a zero knowledge approach and suitably protects the customers' data from third parties—although the entire information concerning user activities and the data stored by the users can be made available to these third parties.
Another aim of the present invention consists in providing the storage space providers with an opportunity to increase their end customers' confidence by providing a process which is easy to understand and offers reliable security. This will allow end customers to overcome their reluctance to using cloud computing.
A further aim of the invention consists in providing a process which transfers many operations, particularly compressing, encrypting, and decrypting operations, to the clients, as in the above described variant 1. At the same time, the computational effort at the clients also is to be reduced, as it will be possible, based on an unencrypted block and a minimum upload to the server, to determine whether a data block is already present on the server and whether it has to be uploaded to the server in preferably compressed, but at least in encrypted form.
The invention is also intended to provide a process which will absolutely prevent any user from damaging the server by transferring manipulated data.
In summary, the inventive process is to offer the best possible protection of data against third parties under the following conditions:
a) These third parties have access to all keys of the storage space provider, know all the used algorithms and the overall process which is applied on the server.
b) These third parties may at any point in time view the information stored on the server, particularly all the data and data blocks and the data block list relating to a user applying the process. The process, thus, has to make sure that the data stored on the server are encrypted using a key which is not known to the storage space provider at any point in time or using a key which is only known to those who have saved the information concerning each and every bit of the unencrypted form of the encrypted data blocks.
c) These third parties may at any point in time view the settings and the user information of all users, particularly user names and passwords, which means that the process has to make sure that it does not become necessary to communicate the password and the user name to the server in any useable form.
d) These third parties may view a log file or a protocol chronologically containing every request made to the server, every upload, and every download as well as every change made by the users. The process, thus, has to make sure that third parties do not obtain any information if a user, for example, only changes a small section of a file and only saves this changed section on the server.
e) The process is intended to guarantee the protection of the data, although it also has to provide an opportunity for dividing data into data blocks of a fixed or variable size and for recognizing whether this data block has already been created and uploaded by the same user or by any other user. This allows the storage space provider to reduce the load on the server resources, which is very important.
f) The most important requirement is that the user password must not be stored on the server, and preferably not even the user name, but its hash value, for example, is stored on the server. It is thus possible to use the user name in combination with the user password for the generation of the user key without delivering any information to the server. The hash value of the user name on the server unambiguously identifies each user. It is, for example, possible to create a sequence of characters containing the user name, the user password, and a constant sequence of characters determined by the server and to create a hash value of this sequence to identify a user and his/her password on the server. The hash value is communicated to the server, allowing the user to log in using his/her password.
g) There are third parties trying to damage the system or to abuse the system, i.e. trying to attack the system. The process should be immune against such attacks to the greatest possible extent.
In this connection, the present invention provides for the following steps to be carried out for each unencrypted data block on the client in order to determine the data blocks and the data block list to be uploaded to the central server:
Alternative 1 (as Described in the Unpublished Patent Application PCT/AT 2011/000216)                generating a data block key using a key generation rule which uses the unencrypted data block as an integral component for generating said data block key;        encrypting the unencrypted data block using the generated data block key to create an encrypted data block based on an encryption rule;        generating a unique data block ID value based on the unencrypted data block and, optionally, based on the key generation and/or the encryption rule and assigning of this data block ID value to the encrypted data block using a data block ID generation rule predetermined for each user in a data block ID value generating step,        communicating this unique data block ID value from the client to the central server in a data block ID value communication step in order to receive a response from the server as to whether the encrypted data block assigned to this data block ID value has to be uploaded to the server, and, based on this response, uploading at least parts of the encrypted data block or not;        saving said unique data block ID value in the data block list, which is to be uploaded to the server;        saving the said unique data block ID value and the data block key in a list of data block keys, unless data block ID value is already part of said list of data block keys;        
and for carrying out the following steps after having completed the above steps for each unencrypted data block:                encrypting the data block list using a key for lists of data and data blocks, said key being generated based on a data block list key rule, said rule being managed by and only known to the user;        encrypting the list of data block keys using the user key;        uploading the encrypted data block list and the encrypted list of data block keys from the client to the central server;        
the unencrypted data blocks, the unencrypted data block list, the unencrypted list of data block keys and the generated data block keys and the user key remaining exclusively on the client;
and, in the data recovery step, the server sending the data which are stored on the server in their encrypted form, the server being not able to decrypt them, to the client, so that data recovery is only carried out on the client.
The sequence of the above mentioned steps is not subject to any limitation and any possible option may be chosen freely.
Alternative 2 (Invention as Claimed)
A file key is generated based on a key generation rule which uses the unencrypted file which consists of at least of one data block as an integral component for the generation of the file key.
The following steps are carried out for each unencrypted data block on the client in order to determine which data blocks and which lists of data and data blocks have to be uploaded to the central server:                encrypting an unencrypted data block using the generated file key to obtain an encrypted data block, using a predefined encryption rule;        generating a unique data block ID value based on the encrypted data block based on a method known to the server and assigning of this data block ID value to the encrypted data block using a data block ID generation rule predetermined for each user in a data block ID value generating step,        communicating this unique data block ID value from the client to the central server in a data block ID value communication step in order to receive a response from the server as to whether the encrypted data block assigned to this data block ID value has to be uploaded to the server, and, based on this response, uploading at least parts of the encrypted data block or information generated therefrom or not;        saving said unique data block ID value in the data block list, which is to be uploaded to the server;        and carrying out the following steps after having completed the above steps for each unencrypted data block:        encrypting the list of data blocks using the file key;        generating a unique data block list ID value from the encrypted list of data blocks and assigning said value to the encrypted list of data blocks;        encrypting the data block list ID value using the user key, which is generated based on a data block list ID value key rule, said rule being managed by and only known to the user;        encrypting the file key using the user key, which is generated based on a data block list ID value key rule, said rule being managed by and only known to the user;        uploading the encrypted list of data blocks and the encrypted ID value of the list of data blocks as well as the encrypted file key from the client to the central server and assigning the encrypted file key to the encrypted data block ID value;        
the unencrypted data blocks, the unencrypted data block list, the unencrypted file keys, the data block list ID value and the generated data block keys and the user key remaining exclusively on the client;
and, in the data recovery step, the server sending the data which are stored on the server in their encrypted form, the server being not able to decrypt them, to the client, so that data recovery can only be carried out on the client.
The following applies to both alternatives 1 and 2: