The diagram of FIG. 1 shows a conventional input/output (“IO”) operation from an application to a block device (e.g., storage such as a disk). This example shows files organized within a file system 101 (which could be, e.g., on a server, a storage node or a “host”). Each file (see File A and File B) has granular entities of data called blocks—which are addressed through logical block addresses. As can be seen, File A has 3 blocks of data which can be accessed with logical block address (“LBA”) of 100, 105 and 200. A block device 103 (e.g., disk or disk subsystem) is operatively connected to the file system 101 and receives the block address requests from the file system 101. The block device 103 responds to the file system 101 with the appropriate block of requested data (although a file system has been given as an example, the same work flow may apply to one or more databases or to any other block access system).
In one specific example, the block device 103 is operatively connected to the file system 101 through Storage Area Network (SAN) or Direct attached storage (DAS) (a SAN storage system hosts multiple file systems through multiple nodes and thus has information of block data for a given application).
Referring now to FIG. 2, this diagram shows a conventional IO operation similar to FIG. 1. In this FIG. 2, however, the IO operation is from an application to a de-duped block device (e.g., storage such as a disk). In various examples, a conventional de-dupe algorithm may be implemented as inline processing and/or as post processing. Further, such a conventional de-dupe algorithm may be implemented in one or more of the following locations: (1) a gateway appliance in a backup path (e.g., IBM's Protectier, EMC's Datadomain products, etc.); and/or (2) a primary storage path on a Network Attached storage (NAS)/Unified storage device (the storage device that implements both NAS and block storage access method in a single device).
Still referring to FIG. 2, files are organized within a file system 201 (which could be, e.g., on a server, a storage node or a “host”). In this example, two files—File A and File B are stored. The de-dupe algorithm has found that 2 blocks of data in File A are similar to 2 blocks of data in File B. That is, Block 400=Block 200 and Block 405=Block 105.
As seen, from the point of view of storage capacity requirements, the de-duping has reduced the capacity needs from 5 blocks to 3 blocks. However, even though the storage capacity requirements have been reduced, the file IO operations still remain at 5 IO requests (the example discussed above is a very simplified view of de-dupe, the actual implementation of de-dupe might be more rigorous in identifying, sizing and placing of block data).