Field
This disclosure is generally related to a content centric network (CCN). More specifically, this disclosure is related to deduping portions of a data block when generating a Manifest hierarchy for the data block.
Related Art
The proliferation of the Internet and e-commerce continues to fuel revolutionary changes in the network industry. Today, a significant number of information exchanges, from online movie streaming to daily news delivery, retail sales, and instant messaging, are conducted online. An increasing number of Internet applications are also becoming mobile. However, the current Internet operates on a largely location-based addressing scheme. The most ubiquitous protocol, the Internet Protocol (IP), is based on location-based address. That is, a consumer of content can only receive the content by explicitly requesting the content from an address (e.g., IP address) closely associated with a physical object or location. A request that has a URL with an IP address for a specific organization causes the request to go to that organization's servers and not to those of another organization.
Recently, content centric networking (CCN) architectures have been proposed in the industry. CCN brings a new approach to content transport. Instead of having network traffic viewed at the application level as end-to-end connections over which content travels, content is requested or returned based on its unique name, and the network is responsible for routing content from the provider to the consumer.
With content centric networks, an Interest message includes a name for a Content Object, and a client can disseminate the Interest over CCN to obtain the Content Object from any CCN node that hosts the Content Object. The Interest is forwarded toward a CCN node that advertises at least a prefix of the Interest's name. If this CCN node can provide the Content Object, this node can return the Content Object (along the Interest's reverse path) to satisfy the Interest.
Publishers oftentimes want to replicate their content across various host servers. They can partition their content into a set of chunks, and can use one or more Manifests to reference the collection of Content Objects that include the chunks that make up the content. They can store the Manifests and the Content Objects across one or more host servers, allowing consumers to obtain the Manifests and Content Objects from any of the host servers.
Manifests sometimes help reduce the amount of data stored on a server by storing a Content Object once, even when referenced multiple times by one or more Manifests. For example, if two or more chunks partitioned from a file include the same data, the Manifests can store the repeating data in one Content Object. This Content Object can be referenced multiple times in the Manifests. However, a typical algorithm for generating the Manifests breaks up the content into chunks of a predetermined size. The repeating data segments may not always be aligned with the fixed-size chunks, which makes it difficult to dedupe repeating data from a file.