It is known that the amount of dynamic content on the Internet or World Wide Web (WWW or the “web”) is increasing at a rapid pace. Most web sites today provide personalized and user-specific information in web pages. Further, the contents of some web sites are highly dynamic. This increase in dynamic content has posed new challenges to the scalability of the World Wide Web due to two important reasons:
(1) Generating dynamic content involves executing programs, such as Perl scripts, on the server. Executing these scripts on the server consumes considerable amounts of computation time and resources. Further, generating dynamic web pages often results in one or more accesses to back-end databases. Hence, there is a need for interactions among the web server, the application server and the database server. These factors significantly increase the computational load on the web servers, which not only increases the average user latency, but may also cause the server to drop requests.
(2) Increasing dynamic content has significantly reduced the amount of content on the web that can be cached. This reduction in cacheable content has caused a noticeable increase in the number of requests that reach the origin web servers.
These factors have adversely impacted the average latency experienced by web users. Hence, there is a growing demand for techniques and systems that are capable of efficiently generating and serving dynamic web data.
There has been considerable research towards alleviating this problem. One of the promising research directions in recent years is fragment-based publishing, delivery and caching of web pages. The research on fragment-based publishing and caching has been prompted by the following observations on the nature of the dynamic web pages and web sites serving them:
(1) Web pages seldom have a single theme or functionality. Usually web pages have several pieces of information whose themes and functions are independent of each other.
(2) Generally, dynamic and personalized web pages are not completely dynamic or personalized. Usually the dynamic and personalized content are embedded in relatively static web pages.
(3) Web pages from the same web site tend to share information among them.
These observations have led to fragment-based publishing, wherein the web content is published and cached at a finer granularity than that of the entire web page. Challenger et al., “System for Efficiently Creating Dynamic Web Content,” Proceedings of IEEE INFOCOM 2000, May 2000, the disclosure of which is incorporated by reference herein, describes an example of the fragment-based publishing technique.
Some advantages of fragment-based publishing and caching include: (i) increasing the cacheable content of a web site by separating the non-personalized content from the personalized content and marking them as such; (ii) reducing the amount of data invalidations occurring in caches; and (iii) improving the disk-space utilization at caches.
There have been considerable research efforts on the performance and benefits of the fragment-based publishing and caching of web pages. However, existing approaches for fragment-based web site design assume that the web pages are fragmented manually at their respective web sites. However, manual fragmentation of web pages is known to not only be costly, but also error-prone.
Thus, a need exists for techniques which overcome the above-mentioned and other limitations associated with conventional web page fragmentation approaches.