Object caching techniques have been deployed to save bandwidth and to improve the download time on the World Wide Web. With object caching techniques, the server sends a requested object with certain metadata, such as cache control headers, that indicate whether the object can be stored in a cache by intermediate proxies and for how long. Many websites, however, do not mark objects as cacheable in order to attract traffic and to maintain accurate statistics. Object caching does not work when clients download only partial objects, such as videos downloaded from youtube.com, or when clients download personalized web pages. In addition, object caching techniques rely on the Uniform Resource Locator (URL) to identify a repeated download of the same object. Many popular websites, however, serve the same content with different URLs. For example, some websites assign different URLs to the same object depending on the location of the server from which the object is being served.
Byte caching techniques replace repetitive streams of application data with shorter “signatures” or “tokens” prior to transmission over the network. Byte caching techniques implement a byte cache at both ends of a network link that store byte sequences. Each byte sequence is uniquely identified by a signature. Thus, if a byte sequence has been previously transmitted, only the corresponding signature needs to be transmitted between end points. Typically, existing byte caching techniques create bytes sequences based on the number of bytes and also create a signature for each of these sequences. Therefore, a large number of byte sequences and signatures are typically generated by conventional byte caching systems, which require a large amount of computing resources.
The complexity of byte caching is largely determined by a “chunk size” parameter. This chunk size is an important parameter for the efficiency of the byte caching systems as a small chunk size improves similarity detection but increases the overhead to generate the chunks. A need therefore exists for improved techniques for determining the chunk size.