Cloud computing-related definitions and standards have been developed by the US National Institute of Standards and Technology (NIST). The present document is to be interpreted with reference thereto, specifically to “The NIST Definition of Cloud Computing” (September 2011) by Peter Mell and Timothy Grance, NIST Special Publication 800-145.
The NIST definition of cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.
Cloud storage in its current form is dominated by service providers which constitute a third party that is trusted by customers to host their data securely. In other words, the trusted central authority model is being used, as in traditional Internet banking, or by government agencies providing central databases for vehicle registrations, real property ownership and so forth.
FIG. 1 is a schematic drawing of a standard cloud storage architecture. Two example users 14 are shown with respective computer devices 25 having files 16 which they wish to store to the cloud, i.e. on a virtualized server 12 owned by a cloud provider. The cloud provider supplies an application 18 for the users 14 via which the users 14 may upload (i.e. store) and download (i.e. access) their files 16 to the virtualized server 12. In turn, the virtualized server 12 uses at least three physical servers 20, on which to store the data of each client file to ensure at least a minimum of redundancy. There is no standard way of carrying out end-to-end encryption within this architecture, so it is vulnerable to security threats. In its basic form, client data being transferred between user computers 25, the application 18, and the cloud provider's servers 12, 20 is not encrypted, so attacks of the transmission lines (snooping), the user computers 25, the application 18 and the servers 12, 20 are all possible.
To get away from this traditional architecture and provide a more secure environment, it has been instant to use blockchain technology which is based on a peer-to-peer network structure.
FIG. 2 shows an example peer-to-peer network 55 as would be part of a cloud computing environment. The network includes network nodes 10, each of which is one or more physical or virtual hardware network entities with which users may interact. The network nodes 10 are connected by communication lines 15. The network entities may be for example, mainframes 61; servers of different types 62, 63, 64; mass storage devices 65 or may be more consumer-oriented devices, such as personal computers, tablets, smart phones, or devices associated with the internet of things (IoT), such as white goods (refrigerators, freezers, washing machines), IP cameras, printers, factory equipment or television recorders.
Blockchain technology does not use the trusted central authority model, but rather provides a distributed database of records comprising a publicly accessible ledger of all transactions or events that have taken place between participating parties, i.e. the members of the blockchain who are at respective nodes of a peer-to-peer network as illustrated in FIG. 4. Each member has a public key, which serves as that member's address, and a private key, which the member uses to digitally sign transactions. A transaction is effected by one member digitally signing the transaction with his private key and sending it to another member using that member's public key as the address. The recipient member then uses the public key of the sending member to verify the digital signature on the transaction. The transaction is placed in a block which is broadcast to every node on the blockchain network and through a process of puzzle solving called ‘mining’ performed by other members the transaction, and any other transactions in the same block, are verified as being valid. The block is then permitted to be added to the chain, thereby becoming part of the public transaction ledger. The blocks of a blockchain are the elements of an ever-extensible sequence of linked events stored on the ledger. What links one block to the previous block in the chain is a hash of the previous block. The hash may be considered as being a token which is proof of the work that the mining member has performed in solving the puzzle and determining the transaction hash by brute force. Each block contains a timestamp and a number of transactions. Each block, once embedded in the chain, becomes effectively impossible to edit, whether intentionally or by hacking, with the security against editing increasing rapidly as a block moves further away from being the end block.
Blockchain technology is described in the article “Bitchain Technology Beyond Bitcoin” by Michael Crosby et al, Oct. 16, 2015 (http://scet.berkeley.edu/wp-content/uploads/BlockchainPaper.pdf), the entire contents of which are incorporated herein by reference.
FIG. 3 is a schematic drawing of a fragment of a blockchain 57 as described above. Three adjacent blocks 2, labeled N, N+1 and N+2 in the middle of the blockchain 57 are illustrated, with Block N being the oldest block of the three and Block N+2 the youngest. Block N includes a ledger portion 4 with a timestamp which records consecutive blockchain transactions TxS ‘c’ to ‘f’. Block N+1 records the next group of transactions ‘g’ to ‘m’ and Block N+2 the transactions after that, ‘n’ to ‘w’. The obliquely oriented arrows labeled HASH are what connects the adjacent blocks, i.e. a subsequent block, e.g. Block N+2, stores the hash to the previous block N+1.
Trust is automated by the public nature of the ledger and by the security measures built into such systems to ensure the blocks of the blockchain cannot be changed. The need for a single trusted authority to mediate between non-trusting transacting parties is done away with.
FIG. 4 is a schematic drawing of a blockchain cloud storage system as offered by Storj. The Storj system uses MetaDisk. In the Storj architecture, a peer-to-peer network 55 of the kind shown in FIG. 2 includes a blockchain 57 and associated public ledger 26. Two example members 14 of the blockchain have associated computer devices 25 which each are part of a node of the network 55. If one member 14 wishes to store a data file 16 to the blockchain cloud storage network, that member subdivides the file into manageable portions, referred to as shards, and encrypts them using his private key 22 before they are distributed out for storage in the network among fellow members of the same blockchain. A blockchain member providing space to the cloud store is called a farmer. Blockchain members can be cloud storage users and/or farmers. MetaDisk is an intermediary which receives and routes customer data to storage locations in the network and can apply a suitable data redundancy rule, such as 3-locations.
FIG. 5 is a schematic drawing showing the sharding process used by the Storj system. As mentioned above, a shard as an encrypted portion of a file that is to be stored. The shards are kept to a fixed size so that the shard size contains no information that may be of assistance to a hacker, with unused space being padded out. Taking the example of a data file 16 with a size of 70 MB, this is split into multiple smaller files 30 of fixed size, illustrated as 32 MB, which are the proto-shards. Each smaller file 30 is encrypted using the members private key 22 and then sent with its hash to the blockchain network 55, 57 for storage.
The Storj system is described in the article: “Storj A Peer-to-Peer Cloud Storage Network” by Shawn Wilkinson et al, Dec. 15, 2014 v1.01 (https://storj.io/storj.pdf), the entire contents of which are incorporated herein by reference.
The files which have been referred to above may be container files, referred to as containers in the following. As more and more systems are moved onto cloud based infrastructure, the data and services that run on containers have increasingly become important. Software containers or containers are virtual machines instances which are isolated from each other and run concurrently on a hardware node as an intermediate layer between the hardware and the operating system. The software, firmware or hardware that creates and runs such a virtual machine instance is referred to as a hypervisor or virtual machine monitor (VMM). In this document, we mainly use the term hypervisor.
This approach is referred to as container-based virtualization, server virtualization or operating system virtualization. Each container represents a virtual environment which has a defined set of hardware resources which are unrelated to the actual available hardware and serve as a basis for configuring the operating system. An example of a container is a docker container. Docker containers run on Linux® applications and wrap up a self-contained piece of software that includes everything needed to run, such as an operating system, system tools, system libraries, so that it will always run in the same way. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.
It is well known that there is an inherent tension between the cloud and legal compliance with national data protection laws. The distributed nature of cloud storage or cloud-based applications has as its essence flexible distribution of data over a network without geographical limitation. On the other hand, national laws on data protection are by definition territorially defined. In Europe, the “Article 29 Working Party” was tasked specifically with looking at how cloud storage could fit within the EU Data Protection Directive (95/46/EC) and the EU e-privacy Directive (2002/58/EC) and issued its opinion in 2012. In Germany, there are also strong criminal sanctions against failure to safeguard personal data (Section 203 of the German Criminal Code StGB) which need to be taken account of. Legal compliance and legal risk management by cloud providers and cloud users, such as businesses and government agencies, may entail ensuring that specific data and services that run on cloud-based infrastructures are located in a specific country or region, and/or that only certain parties have access to specific data and services.
Take the example of a business which has embraced container-based systems for part of its IT infrastructure and uses not just one, but multiple cloud service providers. Such a business will have containers or other file types spread across multiple cloud providers, each container being distributed over multiple servers. It can be appreciated that managing legal data protection issues can become very complex very quickly.
FIG. 6 shows an example system in which three groups of containers are associated with respective cloud storage providers. Suppose a single business owns these container groups and needs them to operate within a system boundary 28, such as a geographical boundary. Three groups of containers 40_1, 40_2 and 40_3 are stored respectively with three different cloud storage providers that have respective virtual servers 12_1, 12_2 and 12_3, each storing their customer data in some complex fashion over multiple physical servers (not illustrated). How can the owner of the containers exercise control over how and where the cloud storage providers store the data that makes up these containers?