Upstream Proxy servers are known in the art and provide an interface between a web client and a server by making requests on the client's behalf and modifying the content that is received before it is presented back to the client. Upstream proxy servers enable browsers to make normal requests to the proxy, which then makes the request from the content server. One application in which proxy servers are useful is a real-time web collaboration environment, where multiple clients are viewing the same cached page that must be dynamically updated, such as a page presenting stock quotes.
As is known by those of ordinary skill in the art, upstream proxy servers are to be distinguished from a “transparent” HTTP proxy, which is recognized specifically as a proxy server by the browser, allowing requests to be submitted in a different fashion. The user of a transparent proxy never sees a difference in the page they receive, i.e., the links are not modified.
One issue with upstream proxy servers is that any links that appear on pages must link back to the proxy server, and not the actual source of the content. To accomplish this, typical proxy servers must perform parsing on the web content prior to presenting the content to the requesting users. Parsing typically involves downloading the requested content, parsing the content to find any embedded links, modifying the links to point back to the proxy server rather than the content source, perform any further content transformation necessary, and then forward the content to the requesting client.
A further challenge to parsing is the increasing use of Java script pages, which allow the generation of web pages dynamically within the receiving client's web browser. Such pages may generate their own links within the browser page which must be parsed and re-directed to the proxy server.
Typically, such parsing routines are hard-coded as procedures provided with a specific product, and are not easily extensible or modified.