As published in USPGazette 20100241619, it is known that digital signatures can be used to uniquely identify files. It is known that two files can be compared to identify their differences. It is known that content management systems endeavor to reduce disk consumption by reducing duplication within an enterprise. It is known that offsite backup of essential files are among best practices for data security. It is known that public/private key pairs are used for asymmetric encryption. When one key of a key pair is used to encrypt a message, the other key from that pair is required to decrypt the message. Conventional backup systems provide services for individuals or corporate customers. However bandwidth considerations are more limiting than raw disk capacity.
Furthermore, a known method comprises the following processes distributed across the Internet and local to customers of the apparatus and service. A data object is disassembled into shards. A recipe is determined for reassembling the shards. A fingerprint is computed for each shard and compared with stored fingerprints for stored shards. Shards are encrypted for transmission through a wide area network. A shard is not stored, encrypted, or transmitted if it can be determined from its fingerprint that the shard is duplicative of a previously stored shard.
Applicants have a rapidly and successfully grown a network of backup server appliances that have successfully scaled to petabyte capacity with wide acceptance. A non-linear expansion is required to meet demand for cloud based backup operations and for box to box backup operations. Conventional systems utilize database technology for tracking archives, ownership, and status of shards which is anticipated by the inventors to be a significant resource requirement in future scaling for higher performance and greater capacity.
Thus it can be appreciated that what is needed is far fewer instruction executions and operational steps to store multiple petabytes with redundancy.