Solutions for data archival (e.g., long-term data archival) based on secret sharing offer a way to store data that is resilient to insider threats and able to operate securely, even while parts of an archival system are compromised. In such an architecture, a file d can be split into a plurality of shares, d1-dn, that are distributed across a plurality of repositories to create a secure data archive. Each repository can be configured to store a single share (e.g., any of d1-dn) created from the single file d. Accordingly, the data (e.g., contents of the file d in its initial form) is most secure when it remains split up into each of the shares that are spread out across the repositories.
The data once more becomes vulnerable when an entity requires reassembly of the data from the respective shares, whereby the shares are reassembled at a single location. For example, an entity having access to a single device at which the shares are reassembled can then access the data in its totality, e.g., to perform a search. However, an entity with malicious intent can access the data in such a configuration, as well as a disgruntled employee who may later wish to distribute the data to expose the type of data stored, and accordingly raise awareness (e.g., public awareness) of what sort of data is being collected.
A specific set of data may require long-term archival in a very secure manner, whereby storage of the data is to be in accordance with a particular legislation, e.g., for example the data is medical information pertaining to one or more individuals and the data is to be stored, long-term, in accordance with a government legislation. While an encryption technology may be deemed “secure” at the time of archival of the data, there is a possibility of the encryption technology being subsequently breached, and accordingly, the data can then be accessed. Such breach of encryption technology can result from a long term attack on the single device, whereby the attack can utilize unlimited computing power and/or storage.
Accordingly, in comparison with conventional technologies of storing data on a single archive device, and “securing” the data by utilizing encryption, applying authorization and/or authentication technologies, etc., storage of data as shares across a plurality of repositories requires the development of technologies and systems to minimize data vulnerability, e.g., when a user is to query the data to analyze one or more items in the data.