Technical Field of the Invention
The present invention relates generally to systems, apparatus, and methods for distributed data storage, and more particularly to systems, apparatus, and methods for distributed data storage using an information dispersal algorithm so that no one location will store an entire copy of stored data, and more particularly still to systems, apparatus, and methods for block based access to a dispersed data storage network.
Description of the Related Art
Storing data in digital form is a well-known problem associated with all computer systems, and numerous solutions to this problem are known in the art. The simplest solution involves merely storing digital data in a single location, such as a punch film, hard drive, or FLASH memory device. However, storage of data in a single location is inherently unreliable. The device storing the data can malfunction or be destroyed through natural disasters, such as a flood, or through a malicious act, such as arson. In addition, digital data is generally stored in a usable file, such as a document that can be opened with the appropriate word processing software, or a financial ledger that can be opened with the appropriate spreadsheet software. Storing an entire usable file in a single location is also inherently insecure as a malicious hacker only need compromise that one location to obtain access to the usable file.
To address reliability concerns, digital data is often “backed-up,” i.e., an additional copy of the digital data is made and maintained in a separate physical location. For example, a backup tape of all network drives may be made by a small office and maintained at the home of a trusted employee. When a backup of digital data exists, the destruction of either the original device holding the digital data or the backup will not compromise the digital data. However, the existence of the backup exacerbates the security problem, as a malicious hacker can choose between two locations from which to obtain the digital data. Further, the site where the backup is stored may be far less secure than the original location of the digital data, such as in the case when an employee stores the tape in her home.
Another method used to address reliability and performance concerns is the use of a Redundant Array of Independent Drives (“RAID”). RAID refers to a collection of data storage schemes that divide and replicate data among multiple storage units. Different configurations of RAID provide increased performance, improved reliability, or both increased performance and improved reliability. In certain configurations of RAID, when digital data is stored, it is split into multiple stripes, each of which is stored on a separate drive. Data striping is performed in an algorithmically certain way so that the data can be reconstructed. While certain RAID configurations can improve reliability, RAID does nothing to address security concerns associated with digital data storage.
One method that prior art solutions have addressed security concerns is through the use of encryption. Encrypted data is mathematically coded so that only users with access to a certain key can decrypt and use the data. Common forms of encryption include DES, AES, RSA, and others. While modern encryption methods are difficult to break, numerous instances of successful attacks are known, some of which have resulted in valuable data being compromised.
Files are usually organized in file systems, which are software components usually associated with an operating system. Typically, a file system provides means for creating, updating, maintaining, and hierarchically organizing digital data. A file system accepts digital data of arbitrary size, segments the digital data into fixed-size blocks, and maintains a record of precisely where on the physical media data is stored and what file the data is associated with. In addition, file systems provide hierarchical directory structures to better organize numerous files.
Various interfaces to storage devices are also well known in the art. For example, Small Computer System Interface (“SCSI”) is a well known family of interfaces for connecting and transferring data between computers and peripherals, including storage. There are also a number of standards for transferring data between computers and storage area networks (“SAN”). For example, Fibre Channel is a networking technology that is primarily used to implement SANs. Fibre Channel SANS can be accessed through SCSI interfaces via Fibre Channel Protocol (“FCP”), which effectively bridges Fibre Channel to higher level protocols within SCSI. Internet Small Computer System Interface (“iSCSI”), which allows the use of the SCSI protocol over IP networks, is an alternative to FCP, and has been used to implement lower cost SANs using Ethernet instead of Fibre Channel as the physical connection. Interfaces for both FCP and iSCSI are available for many different operating systems, and both protocols are widely used. The iSCSI standard is described in “Java iSCSI Initiator,” by Volker Wildi, and Internet Engineering Task Force RFC 3720, both of which are hereby incorporated by reference.
In 1979, two researchers independently developed a method for splitting data among multiple recipients called “secret sharing.” One of the characteristics of secret sharing is that a piece of data may be split among n recipients, but cannot be known unless at least t recipients share their data, where n≧t. For example, a trivial form of secret sharing can be implemented by assigning a single random byte to every recipient but one, who would receive the actual data byte after it had been bitwise exclusive orred with the random bytes. In other words, for a group of four recipients, three of the recipients would be given random bytes, and the fourth would be given a byte calculated by the following formula:s=sOraOrbOrc where s is the original source data, ra, rb, and rc are random bytes given to three of the four recipients, and s is the encoded byte given to the fourth recipient. The original byte s can be recovered by bitwise exclusive-orring all four bytes together.
The problem of reconstructing data stored on a digital medium that is subject to damage has also been addressed in the prior art. In particular, Reed-Solomon and Cauchy Reed-Solomon coding are two well-known methods of dividing encoded information into multiple slices so that the original information can be reassembled even if all of the slices are not available. Reed-Solomon coding, Cauchy Reed-Solomon coding, and other data coding techniques are described in “Erasure Codes for Storage Applications,” by Dr. James S. Plank, which is hereby incorporated by reference.
Schemes for implementing dispersed data storage networks (“DDSNs”), which are also known as dispersed data storage grids, are also known in the art. In particular, U.S. Pat. No. 5,485,474, issued to Michael O. Rabin, describes a system for splitting a segment of digital information into n data slices, which are stored in separate devices. When the data segment must be retrieved, only m of the original data slices are required to reconstruct the data segment, where n>m.
However, prior art dispersed data storage networks have had limited application. In particular, the prior art dispersed data storage networks have not been generally accessible by commonly used operating systems. Rather, dispersed data storage networks have been used to accomplish specific tasks, such as securing extremely sensitive information, or for experimental purposes. Nonetheless, a generally accessible dispersed data storage network would offer significant utility to a wide variety of users. For example, a dispersed data storage network could be interfaced to servers implementing an online store and used to warehouse customer information, like credit card numbers. This would allow the online store the advantages of a dispersed data network without having to implement a special interface to the dispersed data network. Other uses of a block-based interface to a dispersed data storage network are apparent to a person of ordinary skill in the art.