Field of the Disclosure
The disclosure relates to methods and architectures of distributed data storage and data streaming using Wavefront multiplexing (WF muxing). It is focused to data redundancy, storage reliability, and survivability. The WF muxing techniques will use less memory space to achieve better redundancy, reliability, and survivability as compared to conventional techniques of (1) segmenting, or striping, a stream of data into M substreams, (2) creating additional N redundancy among the M substreams via parity or equivalent techniques, and (3) encrypting all M+N sets of substreams before storing them in M+N separated data storage space. In addition, these techniques enable the capabilities of monitoring data integrity of stored data sets without scrutinizing the stored data sets themselves. The same techniques can be extended to data streaming via cloud.
Brief Description of the Related Art
The existing RAID (redundant array of independent disks) techniques have been used extensively for data storage technologies that combine multiple disk drive components into a logical unit. Data is distributed across the drives in one of several ways called “RAID levels”, depending on what level of redundancy and performance (via parallel communication) is required. RAID is an example of storage virtualization and was first defined in 1987 by David Patterson, Garth A. Gibson, and Randy Katz at the University of California, Berkeley. Marketers representing industry RAID manufacturers later attempted to reinvent the term to describe a redundant array of independent disks as a means of dissociating a low-cost expectation from RAID technology. The techniques used to provide redundancy in a RAID array is through the use of mirroring or parity.
Mirroring is one of the two data redundancy techniques used in RAID (the other being parity). In a redundant array system using mirroring, all data in the system is written simultaneously to two hard disks instead of one; thus the “mirror” concept. The principle behind mirroring is that this 100% data redundancy provides full protection against the failure of either of the disks containing the duplicated data. Mirroring setups always require an even number of drives for obvious reasons. The chief advantage of mirroring is that it provides not only complete redundancy of data, but also reasonably fast recovery from a disk failure. Since all the data is on the second drive, it is ready to use if the first one fails. The chief disadvantage of mirroring is expense: that data duplication means half the space in the redundant array is “wasted” so a user must buy twice the capacity that the user wants to end up with in the array. Performance is also not as good as some other techniques.
Duplexing is an extension of mirroring that is based on the same principle as that technique. Like in mirroring, all data is duplicated onto two distinct physical hard drives. Duplexing processing goes one step beyond mirroring processing, however, in that a duplexing processing also duplicates the hardware that controls the two hard drives (or sets of hard drives). So if mirroring on two hard disks is implemented, the two hard disks would both be connected to a single host adapter or controller. If a “duplexing” processing is implemented, one of the drives would be connected to one adapter and the other to a second adapter.
The main performance-limiting issues with disk storage relate to the slow mechanical components that are used for positioning and transferring data. Since a RAID array has many drives in it, an opportunity presents itself to improve performance by using the hardware in all these drives in parallel. For example, if a large file is to be read, instead of via a single hard disk, it is much faster to have it chopped up into pieces, some of the pieces stored on each of the drives in an array, and then all the disks are used to read back the file when needed. This technique is called striping, after the pattern that might be visible if you could see these “chopped up pieces” on the various drives with a different color used for each file. It is similar in concept to the memory performance-enhancing technique called interleaving. Striping can be done at the byte level, or in blocks. Byte-level striping means that the file is broken into “byte-sized pieces”. The first byte of the file is sent to the first drive, then the second to the second drive, and so on. Sometimes byte-level striping is done as a sector of 512 bytes. Block-level striping means that each file is split into blocks of a certain size and those are distributed to the various drives. The size of the blocks used is also called the stripe size (or block size, or several other names), and can be selected from a variety of choices when the array is set up.
Mirroring is a data redundancy technique used by some RAID levels, in particular RAID level 1, to provide data protection on a RAID array. While mirroring has some advantages and is well-suited for certain RAID implementations, it also has some limitations. It has a high overhead cost, because fully 50% of the drives in the array are reserved for duplicate data; and it doesn't improve performance as much as data striping does for many applications. For this reason, a different way of protecting data is provided as an alternate to mirroring. It involves the use of parity information which is redundancy information calculated from the actual data values. The term “parity” has been used in the context of system memory error detection; in fact, the parity used in RAID is very similar. The principle behind parity is simple: take “N” pieces of data, and from them, compute an extra piece of data. Take the “N+1” pieces of data and store them on “N+1” drives. If any one of the “N+1” pieces of data is lost, all pieces of data can be recovered from the “N” remaining drives, regardless of which piece is lost.
Parity protection is used with striping, and the “N” pieces of data are typically the blocks or bytes distributed across the drives in the array. The parity information can either be stored on a separate, dedicated drive, or be mixed with the data across all the drives in the array.
Compared to mirroring, parity (used with striping) has some advantages and disadvantages. The most obvious advantage is that parity protects data against any single drive in the array failing without requiring the 50% “waste” of mirroring; only one of the “N+1” drives contains redundancy information. (The overhead of parity is equal to (100/N) % where N is the total number of drives in the array.) Striping with parity enables advantage of the performance advantages of striping. The chief disadvantages of striping with parity relate to complexity: all those parity bytes have to be computed—millions of them per second!—and that takes computing power.
Norman Ken Ouchi at IBM was awarded a 1978 U.S. Pat. No. 4,092,732 titled “System for recovering data stored in failed memory unit”. The claims for this patent describe what would later be termed RAID 5 with full stripe writes. This 1978 patent also mentions that drive mirroring or duplexing (what would later be termed RAID 1) and protection with dedicated parity that would later be termed RAID 4 were prior art at that time.
Cloud storage refers to saving data to a storage system maintained by a third party. Instead of storing information to a user computer's hard drive or other local storage device, the user saves it to a remote database. The Internet provides the connection between the user's computer and the database. In general, cloud storage is convenient and offers more flexibility. However, the two biggest concerns about cloud storage are reliability and security [1, 2, 3, 4]. Clients aren't likely to entrust their data to another company without a guarantee that they'll be able to access their information whenever they want and no one else will be able to get at it. To secure data, most systems use a combination of techniques, including (1) encryption, using a complex algorithm to encrypt information without additional memory size, (2) authentication, creating a user name and password, and (3) authorization; listing the people who are authorized to access information stored on the cloud system. As to reliability, it is generally true that cloud storage system reliability is significantly enhanced with a redundant storage site. Redundancy is to ensure clients that they could access their information at any given time, even if one of many data sites fails.
There are two more concerns. Many operators offer secured and encrypted storage services. However, secured files are only encrypted on the server side and therefore a client has to rely on honesty of the server operator. The second is concerns about the right of stored data; which are under debate.