1. Technical Field of the Invention
The present invention relates generally to systems, apparatus, and methods for securely storing data, and more particularly to systems, apparatus, and methods for secure distributed data storage using an information dispersal algorithm so that no one location will store an entire copy of stored data.
2. Description of Related Art
Storing data in digital form is a well-known problem associated with all computer systems, and numerous solutions to this problem are known in the art. The simplest solution involves merely storing digital data in a single location, such as a punch film, hard drive, or FLASH memory device. However, storage of data in a single location is inherently unreliable. The device storing the data can malfunction or be destroyed through natural disasters, such as a flood, or through a malicious act, such as arson. In addition, digital data is generally stored in a usable file, such as a document that can be opened with the appropriate word processing software, or a financial ledger that can be opened with the appropriate spreadsheet software. Storing an entire usable file in a single location is also inherently insecure as a malicious hacker only need compromise that one location to obtain access to the usable file.
To address reliability concerns, digital data is often “backed-up,” i.e., an additional copy of the digital data is made and maintained in a separate physical location. For example, a backup tape of all network drives may be made by a small office and maintained at the home of a trusted employee. When a backup of digital data exists, the destruction of either the original device holding the digital data or the backup will not compromise the digital data. However, the existence of the backup exacerbates the security problem, as a malicious hacker can choose between two locations from which to obtain the digital data. Further, the site where the backup is stored may be far less secure than the original location of the digital data, such as in the case when an employee stores the tape in her home.
Another method used to address reliability and performance concerns is the use of a Redundant Array of Independent Drives (“RAID”). RAID refers to a collection of data storage schemes that divide and replicate data among multiple storage units. Different configurations of RAID provide increased performance, improved reliability, or both increased performance and improved reliability. In certain configurations of RAID, when digital data is stored, it is split into multiple stripes, each of which is stored on a separate drive. Data striping is performed in an algorithmically certain way so that the data can be reconstructed. While certain RAID configurations can improve reliability, RAID does nothing to address security concerns associated with digital data storage.
One method that prior art solutions have addressed security concerns is through the use of encryption. Encrypted data is mathematically coded so that only users with access to a certain key can decrypt and use the data. While modern encryption methods are difficult to break, numerous instances of successful attacks are known, some of which have resulted in valuable data being compromised. Furthermore, if a malicious hacker should gain access to the encryption key associated with the encrypted data, the entirety of the data is recoverable.
While modern encryption tends to utilize block ciphers, such as, for example, 3-way, AES, Anubis, Blowfish, BMGL, CAST, CRYPTON, CS-Cipher, DEAL, DES, DESede, DESX, DFC, DFCv2, Diamond2, E2, FROG, GOST, HPC-1, HPC-2, ICE, IDEA, ISAAC, JEROBOAM, LEVIATHAN, LOKI91, LOKI97, MAGENTA, MARS, MDC, MISTY1, MISTY2, Noekeon, Noekeon Direct, Panama, Rainbow, RC2, RC4, RC4-drop, RC5, Rijndael, SAFER-K, SAFER-SK, SAFER+, SAFER++, SAFER++-64, Sapphire-II, Scream, Scream-F, SEAL-3.0, Serpent, SHARK, SKIPJACK, SNOW, SOBER, SPEED, Square, TEA, Twofish, WAKE-CFB, WiderWake4+1, WiderWake4+3, PBE-PKCS5, PBE-PKCS12, etc., other methods have been used in the past. One early form of encoding is transposition. Transposition involves the deterministic swapping of members within a set. For example, if a five member set X is defined as X={a,b,c,d,e}, a transposition function σ may be defined as follows:    σ(0)=a    σ(1)=e    σ(2)=c    σ(3)=d    σ(4)=bTherefore, the application of the transposition function to the entire set X would yield a new set X′={a, e, c, d, b}.
By transposing information transmitted in a message, the usability of the transposed information is reduced or eliminated. However, transposition schemes are easily broken by modern computers.
In 1979, two researchers independently developed a method for splitting data among multiple recipients called “secret sharing.” One of the characteristics of secret sharing is that a piece of data may be split among n recipients, but cannot be known unless at least t recipients share their data, where n≧t. For example, a trivial form of secret sharing can be implemented by assigning a single random byte to every recipient but one, who would receive the actual data byte after it had been bitwise exclusive orred with the random bytes. In other words, for a group of four recipients, three of the recipients would be given random bytes, and the fourth would be given a byte calculated by the following formula:s′=s⊖ra⊖rb⊖rc,where s is the original source data, ra, rb, and rc are random bytes given to three of the four recipients, and s′ is the encoded byte given to the fourth recipient. The original byte s can be recovered by bitwise exclusive-orring all four bytes together.
A cryptosystem, such as secret sharing, is called information-theoretically secure if its security derives purely from information theory; meaning that its security can be proven even if an adversary has access to unlimited computing power. As a secret sharing scheme can guarantee that no usable information can be recovered unless an attacker gains access to a threshold number of shares, secret sharing is information-theoretically secure. However, each data share is of equal size as the original data, so secret sharing makes for an inefficient storage mechanism.
All-or-nothing encryption is a recent development in cryptography, with the property that the entire cyphertext must be decrypted before even a portion of the original data can be recovered. The original motivation behind all-or-nothing encryption was to increase the time required by brute force attacks to successfully compromise an encrypted cyphertext by a factor equal to the number of message blocks within the cyphertext. All-or-nothing encryption is described in “All-Or-Nothing Encryption and the Package Transform,” by Ronald L. Rivest, which is hereby incorporated by reference. Additional properties of all-or-nothing encryption are described in “Exposure-Resilient Functions and All-Or-Nothing Transforms,” by Ran Canetti, Yevgeniy Dodis, Shai Halevi, Eyal Kushilevitz, and Amit Sahai, as well as “On the Security Properties of OAEP as an All-or-nothing transform,” by Victor Boyko, both of which are hereby incorporated by reference.
Dispersed data storage systems involved utilizing an information dispersal algorithm to slice data Schemes for implementing dispersed data storage systems, such as dispersed data storage networks (“DDSNs”), are also known in the art. For example, U.S. Pat. No. 5,485,474, issued to Michael O. Rabin, describes a system for splitting a segment of digital information into n data slices, which are stored in separate devices. When the data segment must be retrieved, only m of the original data slices are required to reconstruct the data segment, where n>m.
Generally, dispersed data storage systems provide some level of security, as each data slice will contain less information than the original digital information. Furthermore, as each slice is stored on a separate computer, it will generally be harder for a malicious hacker to break into m computers and gather enough data slices to reconstruct the original information. However, depending on the information dispersal algorithm utilized, each data slice will contain up to 1/n part of the original data. Generally, the information will be retained in the data slice as it existed in the original digital information. Accordingly, by compromising a storage node, a malicious hacker could access up to 1/n part of the original data.