Cloud storage of data is a growing way of allowing a user to store data remotely from their own devices. For example, a user may have a large collection of music files and wish to store them in a networked data storage that allows the user to access those files regardless of which device he is using. Data is stored at a networked server, typically operated by a third party that provides a cloud data storage system. Examples of such systems include Google Cloud Storage, iCloud, DropBox and FilesAnywhere.
There are several advantages to using cloud data storage. The user is not required to operate and maintain physical storage media themselves, but can leave that to the operator. Furthermore, as cloud storage providers typically have redundancy built into their cloud storage systems, the data is more secure than if it were stored locally in one location and is likely to be lost in the event of, say, a fire, or the storage medium failing.
Of course, it is in the interest of the cloud storage operator to minimize the amount of data that they need to store at their servers. One way to do this is to avoid duplication of certain data. For example, two different users may each own a copy of a particular music file. Rather than storing separate copies of the same file for each user, the cloud storage operator stores a single copy of the music file and gives both users permission to access the file. This is described in, for example, U.S. Ser. No. 12/751,850.
One way that cloud data storage systems attempt to avoid duplication and optimize upload speeds is by the use of one or more hashes of the file to be uploaded. A hash of the file is uploaded from the user to the cloud data storage system. A server in the cloud data storage system compares the uploaded hash value with a hash value of the file (or portions of the file) that is already stored in the cloud data storage system. If the hash values match, then it is assumed that the user has a duplicate of the file already stored in the cloud data storage system, and there is no need to upload the file from the user. However, the user is granted access to the already stored file, as the matching hash values confirm that the user already has a copy of the file. This has the advantage that upload speeds are greatly improved, as there is no need to upload the entire file (or portions of the file) if a copy is already stored in the cloud data storage system.
However, it is possible for a malicious user, who does not have a valid copy of the file, or the right to access a file, to obtain a copy of the hash from an online source. The file name and hash can then be presented to the cloud data storage service as part of an upload procedure. As the hash value presented by the malicious user matches the hash value of the file stored by the cloud data storage service, the cloud data storage service assumes that the malicious user is entitled to access the file and so grants access. The presentation of the hash value by the malicious user only proves that the user knows the hash value; it does not prove that the malicious user has possession of the file. There is therefore a need to improve the security of uploading files to a cloud data storage system.
An obvious way to address this problem is to require that a user uploads the entire file to the cloud data storage system, rather than just a hash derived from the file. The entire uploaded file can be compared with a file already stored in the cloud data storage system. If the files match, then the uploaded file is deleted and the user is granted access to the file already stored at the cloud data storage system. This ensures that the user has ownership of a copy of the file. However, in the case of many uploaded files, or large files, a large amount of time and bandwidth may be required to upload entire files. This increases the bandwidth resources required by the upload server at the cloud data storage system, thereby increasing costs, and also increases the time a user must wait to confirm an upload, thereby giving the user a less satisfactory experience.
There is a need to improve the security of cloud data storage systems in allowing users access to files, while allowing quick uploads of duplicate files that many different users are entitled to access.