Recent expansion of data centers and cloud computing has triggered fast growth of storage area network (SAN) servers, network attached storage (NAS) servers and unified storage data servers, which combine both SAN and NAS. The typical consumers of computation resources in a data center are application servers, virtual machines, virtual machine hypervisors (e.g., ESXi, Hyper-V, Xen), etc. These devices typically utilize network storage servers over internet protocol (IP), for NAS, and/or over Fiber Channel, for SAN. In relation to the network storage servers these software products act as network storage clients.
Some of the features that network storage servers implement are deduplication, compression, caching and tiering. Deduplication and compression are directed to saving physical storage space, whereas caching and tiering seek to improve access time to the data. Compression is generally applicable to a single file or a data stream. Compression algorithms reduce the size of the data by finding and encoding repeating patterns. Some examples of compression algorithms used by file servers include LZ, LZW and LZR lossless compression algorithms. The degree of data compressibility can vary in a range from almost zero (i.e., fully compressible) to almost one (i.e., incompressible).
Examples of incompressible data types include archive files, movies, pictures, encrypted files, etc. Highly compressible data files are those containing large regions of equal bits, such as bitmaps. Bitmap files can show compressibility of approximately 0.02, which is equivalent to a compressed file being 50 times smaller in size compared to the original bitmap file. The degree to which the data can be compressed depends on the chosen algorithm and trade-offs, such as compression time versus compression efficiency.
Deduplication refers to a set of algorithms that find repeating chunks of data both within a file and across multiple files. In a particular case of a file stored under different names, the deduplication algorithm is able to recognize the identity of these files and use the physical storage space for one copy only. Note that a compressible file or a set of compressible files can be not deduplicable (e.g., if all the data is unique) whereas incompressible files, blocks, or other storage objects having the same content can be deduplicable.
Caching and tiering refer to the ability of the servers to determine sections of data that are accessed most often and place them in the data store or memory providing the fastest access. These features are used not only in storage servers but also in wide-area network (WAN) optimization. WAN optimization is a collection of techniques for increasing data-transfer efficiencies across wide-area networks. Among the techniques used are compression, deduplication, caching and tiering.
Implementation of the above described features is a complicated task and, therefore, requires extensive validation and benchmarking. Conventional tools used for testing network storage servers, however, typically do not provide data validation. Alternatively, these tools may verify the data corrupted in transit between a storage client and a storage server, but cannot identify the data erroneously provided by the storage server.