The evolution of large networks and computing resources in the cloud are posing new challenges to file security in the era of the highly mobile user. Files that are served to such users from resources that are in the cloud can be accessed with various types of user devices. These devices may be mobile or stationary. The devices themselves can be the user's primary devices, but more often they devices that the user treats as secondary, tertiary or even sporadic.
Under these parameters, users that can access files stored in the cloud from many different devices present a particular problem, since their devices are less trustworthy than the users. Hence, securing files served up through computer clusters in conformance to this new threat model has become a pressing need. More precisely, the problem concerns securing high performance compute cluster's distributed file systems. The file system must allow access from many different client devices that are not trusted, but their users can be authenticated. The file system data must be available for processing on nodes inside the cluster (cluster computing).
Existing encrypted distributed file systems do not meet the requirements. They are not designed for “big data” cluster computing and do not offer the required performance. This refers in particular to Tahoe and JigDFS. As explained by Bian, J. and Seker R., “The Jigsaw Secure Distributed File System”, Computers & Electrical Engineering, Feb. 1, 2013, the class of secure distributed file systems such as Tahoe and Cleversafe use an Information Dispersal Algorithm (IDA) to achieve fault tolerance as well as introduce a certain level of security. JigDFS falls into this category as well. Especially in Tahoe, like in JigDFS, files to be uploaded are encrypted, then split up into slices. Each slice is uploaded to a different server to spread the load uniformly, avoid correlated failures, and limit each node's knowledge about the original file. However, unlike Tahoe and Cleversafe JigDFS employs a decentralized peer-to-peer structure that enhances the system's scalability and improves the system availability in the event of a server failure. Moreover, in JigDFS, file segments are encrypted recursively using keys derived from the hashed-key chain algorithm and then sliced further through the IDA encoder. By doing so, JigDFS not only increases system's fault tolerance and data availability, but also makes attacks on file system and data more difficult. However, a JigDFS cluster is organized as a decentralized peer-to-peer network without a central server.
Systems that do offer commensurate performance levels, on the other hand, require that client devices be trusted and are hence not well-suited under the new threat model. Still others, such as NFS, AFS and HDFS do not protect data cryptographically.
Encryption usually imposes a heavy burden on a file system. To limit this burden, there is a need to avoid separate encryption work such as block-level encryption for data at rest and separate, e.g., storage network encryption for data in flight or in transit. This issue is identified by Pletka R., et al., “Cryptographic Security for a High-Performance Distributed File System”, IBM Zurich Research Laboratory, Switzerland, 2006, pp. 1-13. The authors also point out that an optimally secure distributed storage architecture should minimize the use of cryptographic operations and avoid unnecessary decryption and re-encryption of data as long as the data does not leave the file system.
Of course, encryption of data by block and per-page keys is known in the art. For example, U.S. Pat. No. 8,121,294 to Ciet et al. discloses systems, methods and media that split input data into blocks and derive per page keys. These are obtained by using a master key in conjunction with still other keys.
In EP 2,511,848 Van der Veen teaches encryption of data that is appropriate for large scale file systems. Van der Veen deploys a per-data key or an “object” cryptographic key that is encrypted in a different key, thus providing for a level of indirection to the encrypted files in the process. The second key is a per “domain” key. These teachings are specifically concerned with file metadata that could be stored anywhere in the system.
Despite the many useful teachings outlined in the above references and many more contained in the literature, there exists an unmet need for proper safeguarding of files in file systems served in the cloud. Here, the prior art teachings do not provide for a method and system that can be deployed in a distributed file system (DFS) on a computer cluster that is accessed under the threat model of an untrusted device used by an authenticable user (semi-trusted user threat model).