Let's first study data block processing.
A block data storage consists of a set of fixed size blocks into which information is stored. Each block has its own identifier, usually a sequence number. This sequence number is commonly known as block number.
A typical example of a block data storage are computer mass memories, such as hard disk drives (HDD) or semiconductor based persistent memories (SSD=Solid State Disk). When information is written to a mass memory storage or read from there, write or read position in based on a logical block address (LBA). When using LBA, the blocks are numbered in a sequential order starting from zero, a typical block size being 512, 1,024 or 2,048 bytes.
In addition to LBA also other block identifiers are used, for example MFM hard drives (Modified Frequency Modulation) in old PC equipment were using CHS addressing (cylinder-head-sector), in which the disc track, read head and disc sector had individual identifiers. Thus, there were three parameters in CHS. CHS addressing can be converted to LBA, whenever the maximum values for at least two of these three parameters are known. The following equation holds between CHS and LBA block number (lba) applieslba=(((c*MAX—H)+h)*MAX—S)+s−1,  (i)where MAX_H is the number of read heads and MAX_S the number of sectors, and c, h, and s are track, read head and the disk sector parameters of the CHS address.
In general we can say that there is a wide range of data storages where block addressing may be converted one way or another to the logical block number.
A file system, which is usually created on top of the block data storage, enables storing data as files. The file system, among other things, takes care of defining logical block numbers for data read and write operations.
The most common file systems support restoring a deleted file, but in general restoring the entire device back to a previous state is difficult, if not impossible.
Second, let's study data block encryption.
Data blocks are typically encrypted by using a block encryption algorithm, such as AES-256, which allows plaintext to be converted to ciphertext using an encryption key. The block size of many encryption algorithms is however smaller than a typical block size of the block data storage, for example in the mentioned AES-256 it is 16 bytes. For this reason, to encrypt one block of a data storage, a number of encryption blocks must be combined.
IEEE (Institute of Electrical and Electronics Engineers) has published the XTS-AES-standard, IEEE P1619 (IEEE Standard for Cryptographic Protection of Data on Block-Oriented Storage Devices, IEEE Std 1619™-2007, 18 Apr. 2008), especially for encrypting disc or tape storage blocks.
Also other methods have been developed for combining the encryption blocks in data storage block encryption, such as the previously most common CBC mode, in which encryption blocks are chained one after another. Compared to chaining XTS-AES has an advantage, among other things, that encryption blocks can be processed in parallel.
The mentioned XTS-AES-standard presents a notation for encryptionC←XTS-AES-Enc(Key,P,i)  (ii)and decryptionP→XTS-AES-Dec(Key,C,i)  (iii)where Key is a 256 or 512-bit XTS-AES key, P plaintext, C ciphertext, and i a 128-bit tweak. Algorithm details can be found in the standard.
Despite the strength of the encryption algorithm, one of the weak links in conventional encryption is the encryption user: Encryption keys are mostly derived from a password defined by the user. Once the user defines the password, he may                form it from a proper or common noun, or        use the same password in different situations.        
If the password is derived from a proper or common noun, the password can be guessed using a commonly known dictionary attack, in which known words are varied and attempted to use them as passwords one after another.
If the user uses the same password or its derivatives in different situations, finding the password in one situation makes it easier to access other data storages of the same user. Changing the password afterwards in many data storages is laborious, so it's rarely done just in case.
Third, let's study information hiding (steganography).
Part of the data storage blocks can be hidden. It is essential in hiding that the existence of hidden data cannot be detected, unless there is an explicit access to it. In this case, the existence of hidden data may be denied. (One reference to the subject is Anderson, R., Needham, R., and Shamir, A. The steganographic file system. In Information Hiding, Second International Workshop, Portland, Oreg., USA, Apr. 14-17, 1998, Proceedings (1998), D. Aucsmith, Ed., vol 1525 of Lecture Notes in Computer Science, Springer, pp. 73-82)
The hidden part of the data storage will hereafter be referred as a hidden volume.
Although the content in the hidden volume could not be found, its existence may easily become exposed. For example, one can store to a normal block data storage the amount of data equivalent to the total capacity of blocks: If the amount of data which is its nominal capacity can not be written to the storage, one can reasonably assume that some capacity is reserved for other uses, such as for a hidden volume.
When the existence of hidden data is suspected with a good reason, the holder of data volume can be compelled to reveal it. In the case of the hidden volume, whose existence can be denied, the same kind of pressure can not be reasoned as obviously.
Fourth, let's study SSD storage devices.
SSD storage devices are replacing traditional hard disk drives especially as laptop storage devices, but also in certain server applications. It is likely that SSD will eventually replace hard disk drives due to its power consumption, impact resistance and other mechanical strength, silent operation and non-existent seek time. Most SSD storage devices are based on Flash technology.
Storage capacity and data transfer rate of hard disk technology have grown rapidly for decades. Yet one of the hard disk technology limitations has remained almost unchanged—seek time. In hard disk drives data is written on and read from the surface of the disk using a read head mounted on an actuator arm. Moving the actuator arm back and forth is causing typically 4-10 ms delay in disk read and write, unless the data is located sequentially on the surface of the disc, stored in adjacent tracks.
SSD devices do not have similar seek time problem. Data is addressed electronically, and the lag from changing the reading and writing position is non existent. Thus, the speed of random access in SSD memory devices is almost the speed of sequential access.
SSD memory devices have, however, one weakness: the amount of write cycles is limited. Using Flash technology each memory cell can be typically written from 10000 to 100000 times, which in continuous use shortens the lifetime of the memory device. Lifetime can be extended, for example using methods that recycle recordable memory cells. One of the related patents is U.S. Pat. No. 6,850,443, “Wear leveling techniques for flash EEPROM systems”.
Fifth, let's study data integrity.
In practice, all block data storages contain some additional data which can be used to determine whether the data read from the data storage has remained unchanged.
Traditionally, checksums are calculated for blocks of data to ensure data integrity. For example, when saving each block on the hard disk, a checksum is calculated at hardware level and stored to the disk with the block. When the block is read from the disk, also the checksum is read. If it does not match with the other data in the block, it indicates a fault in reading or writing the data. For this purpose a CRC checksum is commonly used.
When a block data storage is encrypted, an encrypted block takes the same space as unencrypted block. Thus, there is no space in the blocks for any such extra data, which would ensure the success of encryption and decryption.
Sixth, let's study calculating a digest.
A digest identifies data content with a smaller amount of data than the original data content has. A good digest has a property, that, no matter how similar two different data blocks are, the blocks don't produce the same digest. A good digest has also a property that checksums have uniform distribution over the available number space.
Traditional checksum is calculated in a linear manner with sum and multiply operations, which results that some information from the actual block content can be derived from the checksum. If, for example, the checksum is the sum of data elements, and one data element is missing, it can it be calculated from the remaining data and the checksum. In most cases, such as in database hash tables, this property does not matter, and in some cases, such as in error correction, it is even beneficial. But there are also applications where the checksum should be such that it does not disclose anything from the data from which it has been calculated.
Secure digest (secure hash) checksums can be generated by using non-linear transformations, where the transformation can be used one way only. In this case it is possible to calculate a checksum, which cannot be used to restore the actual data. SHA-256 and RIPEMD-160 could be mentioned from the commonly used methods. These are generally considered as good digests.
Seventh, let's study the use of a hash table.
Hash table is a commonly known search structure, the data structure associating keys to values. Good sources for the use of hash tables can be found in the publications Donald E. Knuth: Art of Programming, Volume 3: Sorting and Searching (2nd edition, Addison-Wesley, 1998, ISBN 978-0201896855) and Cormen, Leiserson, Rivest and Stein: Introduction to Algorithms (MIT Press, 2003, ISBN 978-0262032933).
When a key (for example, the person's name) is given to a hash table, it returns the value (phone number). Internally it creates a digest from the key, from which an index is derived to a table of values.
In an example presented in drawing 1 a digest 632 is calculated from the name “Lasse Lahtinen”, 1 from the name “Liisa Lahtinen” and 998 from “Sami Siltanen”. These digests are used as a hash table indexes. In the hash table bucket of the corresponding index there is the name and telephone number. This principle works smoothly as long as the hash table has free buckets.
Eventually the hash table becomes so full that two names have the same digest. To handle this kind of a collision, a number of methods have been developed, from which one is presented next: Drawing 1 illustrates a method which is based on linear probing, in which, when the collision occurs, the next available record in the hash table will be taken into use. It is beneficial to have a field in the hash table records which in a way or other indicates how far the correct record may have to be searched. In this example, “Suvi Saarinen” has the same digest 632 than “Lasse Lahtinen”, but the next record 633 is already reserved for “Leevi Lassila”. Because the record 634 is free, “Suvi Saarinen”s data will be placed there, and number two is saved to the record “Lasse Lahtinen” corresponding to the digest, because two is the maximum search distance 634 minus 632.
Performance of linear probing is known to degrade when the hash table is becoming full. The problem can be moderated by making the hash table somewhat larger than the minimum required size, for example 20%.
Eighth, let's study web servers.
Internet access is now available to almost everywhere, but the connection is not necessarily broadband. Secure communication protocols have been developed for IP data transmission (Internet Protocol) between computers, for which open source libraries can be found. For example, the open source OpenSSL library provides support for SSL/TLS protocols.
Ninth, let's study creating a data storage to an operating system.
For example, in a Linux operating system, a local data storage may be created as a network drive: a generally known Network Block Device (NBD) can be mounted in a way that it listens to a local IP address. Data storage is then located in the same machine, and NBD makes it possible to process the files as data blocks regardless of the file system.
Drawing 2 presents a model related to Windows operating systems, how applications (201), such as Microsoft Word, write files to a data storage (208). Broadly speaking, the file system stack (206) processes data as part of files, while data storage driver stack (207) processes data as blocks. Applications (202) and part of the operating system services (203) in Windows operating systems operate in user mode (204), while most of the drivers are in kernel mode (205).
Data storage may be a physical storage device such as a disk or tape drive, or logical volume, such as a file created by open-source TrueCrypt software. The file created by TrueCrypt appears as a local drive to the computer when it is opened with TrueCrypt software. That file can of course be the physical size of the memory capacity of the device and the only file in it.
TrueCrypt teaches to a person skilled in the art how disk blocks are encrypted, for example with the Windows kernel programming. Nowadays TrueCrypt uses XTS-AES encryption standard. TrueCrypt is an open source example of driver software placed to the driver stack (204) of data storage.