There is a universal trend toward storage outsourcing through the cloud (e.g., Google Drive, Amazon S3, Microsoft OneDrive), bringing advantages such as cost saving, global access to data, and reduced management overhead. Yet, the most important disadvantage is that the data owner (client), by outsourcing her data to a cloud storage provider (server), loses the direct control over her data. Therefore, the client expects having an authenticated storage and guaranteed retrievability. The former means that the client wants each data access to return the correct value; i.e., a value that is the most recent version of data that has been written by the client herself. The latter means that the client wants to make sure that her data is retrievable; i.e., she can retrieve all her data correctly. These authenticity and retrievability checks should be much more efficient than downloading the whole data.
A simple mechanism to provide an authenticated storage is to compute a digest (e.g., hash. MAC, or signature) of data and keep it locally after transferring the data to the server (or in case of a MAC or signature, the key is kept locally, while the tags can be stored at the server). But, the client needs to download the whole data and check it against the locally-stored digest to investigate the authenticity of her data, which is prohibitive given current trends of outsourcing tens of gigabytes of data.
Provable Data Possession (PDP) is a very close line of work, providing probabilistic guarantees of possession of the outsourced file using a challenge-response mechanism. Similar schemes were later proposed targeting public verifiability and availability. A PDP does not use erasure-correcting codes (ECC) and guarantees that the client can retrieve most of die outsourced data but not the whole data. PDP can be improved by adding ECC to enhance the possession guarantee.
As a first attempt to create dynamic PDP, techniques were developed to enable clients to update a single block where they pre-compute and store at the server a limited number of random challenges with the corresponding answers. Therefore, the number of challenges is limited and later updates affect all remaining answers.
Then, a fully dynamic PDP scheme was developed, which uses rank-based authenticated skip lists providing O(log(n)) complexity for updates and audits, where n is the number of blocks. Later variants use other data structures, supply additional properties, distribute and replicate, or enhance efficiency.
The PDP framework does not use erasure-correcting codes (ECC) and hence is more efficient. However, PDP does not provide the same retrievability guarantee as other schemes. One of these schemes is Proofs of Retrievability (PoR) which will be discussed below. The security guarantee a PDP gives is weaker than a PoR in the sense that it guarantees that the client can retrieve most of the outsourced data. In contrast, the PoR guarantees retrieving the whole data. PoR construction can further be improved. The compact PoR may be created from a PDP combined with erasure-correcting code. Yet, this only shows the relationship between the static versions of PDP and PoR.
Static techniques include Proofs of Retrievability (PoR). In a PoR scheme, the client, before outsourcing her data, encodes it with an erasure-correcting code (ECC), and then applies an encryption and a permutation on the result to make it difficult for the server to locate the related parts of the encoded data. Using erasure-correcting codes brings some redundancy while giving the guarantee that the server should manipulate a significant portion of the outsourced (encoded) data in order to impose a data loss or corruption.
There are different PoR schemes, one using pseudorandom functions that is secure in the standard model and the other using BLS signatures that is secure in the random oracle model. The former supports only private verifiability while the latter allows public verifiability. The main advantage over previous schemes is that it supports unlimited number of challenges. Using PoR systems, such a misbehavior resulting in a large data loss or corruption will be caught with a very high probability. Efficiency and security are seemingly two conflicting properties related to update in PoR schemes.
PoR schemes fail to provide efficient and secure update on the outsourced data. In order to have an efficient update, it should affect as small part of the data as possible. But, the server can erase or modify all affected blocks, with a small probability of being caught. To prevent such misbehavior, a small change should affect a significant fraction of the outsourced data, which is not efficient.
The first real dynamic PoR scheme has constant client storage and polylogarithmic communication and computation. As a building block, they use an ORAM satisfying a special property called next-read-pattern-hiding. Although it achieves asymptotic efficiency. ORAM is a complicated and heavy cryptographic primitive that is (currently) not practically efficient. Later, locally updatable and locally decodable codes are used to construct a dynamic PoR scheme. The data is erasure-coded, and the client stores it remotely inside a hierarchical authenticated data structure similar to ORAM. Later updates are also erasure-coded and stored in the same data structure. Reading through that structure requires O(n) cost, hence, they store the plain data and subsequent updates in another similar structure.
Later improvements, at a high level, separate the updated data from the original data, and store the update logs in a hierarchical data structure similar to ORAM. On the other hand, the first dynamic PDP protocol was created before the first dynamic PoR. The reason is that, since PDP-type schemes do not employ erasure-correcting codes, the above mentioned problems did not exist.
Another dynamic PoR scheme similar to the previous one uses the fast incrementally-contractible codes to achieve efficiency. Later, the dynamic PoR was improved by outsourcing some part of computation to the server, reducing the communication and client computation. Using the introduced homomorphic checksum, the client only performs the computation on these checksums, leaving computation on data itself to the server.
One important difficulty in existing DPoR schemes is the excessive volume of communication.
It has also been pointed out in prior art that adding local storage to the client has no effect on the asymptotic costs.