Data deduplication may be characterized as a specialized data compression technique for eliminating duplicate copies of repeating data, thereby reducing the amount of storage needed for a given quantity of data. A current problem faced by entities that store data is the complicated, time-consuming process that is typically involved in evaluating and comparing vendor designs of deduplication and compression systems employed in data storage. Another problem currently facing such entities is the need to protect its confidential information, as well as the legal duty of such entities to protect personally identifiable information (PII), stored in its data storage systems from exposure outside the entity.
The types of systems that such entities may be interested in evaluating are typically data storage systems designed to store large amounts of data. Such data storage systems may be referred to in the industry as block storage devices. Such block storage devices may include, for example, both disc-storage systems and flash-based storage systems. In a block storage device, each individual data element may have a particular size, such as 4096 bytes, and each individual block of data stored on that storage device is accessible by a unique address that may be referred to as a logical block address (LBA)
Typically, an entity, such as a financial institution, health care organization, or government agency may not be permitted to take copies of data that contains PII outside of the entity. Therefore, it is typically not possible for such entities to allow use of copies of data at a vendor site or in a laboratory environment that is not controlled by the entity for evaluation of the vendor's system designs. A traditional approach to this problem has been for an entity to engage with a systems vendor and have the vendor provide its product to the entity for evaluation. The entity may then install the vendor's product in one of the entity's own facilities and perform an evaluation of the vendor's system design by and under the entity's control. That traditional evaluation process may typically take up to six or more months to complete.
There is a present need for methods and systems that enable rapid evaluation of potential vendor designs of deduplication and compression systems and that ensure an apples-to-apples comparison of competing designs. There is presently a further need for methods and systems that assure that the results of tests to evaluate those designs are valid against one another and that they do not expose any PII or confidential information of entities that is stored in the systems of such entities.