The growth of the World Wide Web has contributed significantly to the network traffic on the Internet. Congestion can occur at network exchange points or across links due to the increase of bandwidth usage, user-perceived latencies, and Web server loads [1][9]. Web caching has become an important topic for reducing latency and overall bandwidth consumption [4]. Various caching mechanisms have been proposed as solutions for improving the retrieval rate of large distributed documents [2][10][14].
Location of Web caching servers, network topology, and traffic flow are important factors to build the effective Web caching architectures [13]. Finding optimal placement of Web proxies in a network can yield successive reductions in network traffic, but an expensive cost in replication of disk storage is needed in Web caching architectures. The main purpose of interest at this project is to reduce the cost of replication and Web traffic in Web caching architectures.
Web documents contain text, images, applets and streaming audio/video as well as hypertext links. In general, images have static information and other objects have dynamic contents. Caching of static information provides great performance gains when multiple Web caches do collaborate to serve each other's cache information.
According to the Web document characteristics the top requested Web documents are image files and text files (e.g., gif 42.3% and jpeg 12.3% and html 24.4%). The largest percentage of bytes transferred is accounted for by image files with 40% (e.g., gif 20.9% and jpeg 18.9% and html 25.3%). Streaming media such as audio and video is still a relatively small proportion of total traffic volumes. The remaining content types account for decreasing numbers of transferred bytes with a heavy-tailed distribution [16]. Web tracing studies indicate that the efficient image distribution is the solution to accelerate performance of IP networks.
The most e-Commerce Web sites display many small images such as logos, icons, text as graphics and products' photos in one page. Web servers are handling an increased load of small images with the exponential growth of e-Commerce Web traffic. Maintaining cache consistency for e-Commerce data is expensive. Time-to-live (TTL) fields, active invalidation protocols such as Web cache invalidation protocol (WCIP), and client polling are used to avoid transferring the potential of caching stale data. Validation, invalidation or reload of images not only burdens the user, but also burdens the Web server. Data compression and Web caching are increasingly important in providing fast Internet Services. However, these technologies have been developed separately. The goal of our project is to improve the image distribution and management technologies using a multi-disciplinary approach.