The present invention relates to a scheme for differentiating cacheable from non-cacheable objects that may be referenced in web pages and the like using port designators within uniform resource locators (URLs) that identify where the objects can be found.
The Internet is a vast and expanding network of networks of computers and other devices linked together by various communications media, enabling all these computers and other devices to exchange and share data. Sites on the Internet provide information about a myriad of corporations and products, as well as educational, research and entertainment information and services.
A computer or resource that is attached to the Internet is often referred to as a xe2x80x9chost.xe2x80x9d Examples of such resources include conventional computer systems that are made up of one or more processors, associated memory (typically volatile and non-volatile) and other storage devices and peripherals that allow for connection to the Internet or other networks (e.g., modems, network interfaces and the like). In most cases, the hosting resource may be embodied as hardware and/or software components of a server or other computer system that includes an interface, which allows for some dialog with users thereof. Generally, such a server will be accessed through the Internet from a client computer or other device (e.g., via client applications and/or Web browsers such as Netscape""s Navigator(trademark) and Communicator(trademark) and Microsoft""s Internet Explorer(trademark)) in the conventional fashion.
Briefly, if an Internet user desires to establish a connection with a host (e.g., to view a Web page located thereat), the user might enter into a Web browser program the URL (or Web address) corresponding to that host. One example of such a URL is xe2x80x9chttp://www.domain.com:80/webpages/mypage.htmxe2x80x9d. In this example, the first element of the URL is a transfer protocol (most commonly, xe2x80x9chttpxe2x80x9d standing for hypertext transfer protocol, but others include xe2x80x9cmailtoxe2x80x9d for electronic mail, xe2x80x9cftpxe2x80x9d for file transfer protocol, and xe2x80x9cnntpxe2x80x9d for network news transfer protocol). The remaining elements of this URL (in this case, xe2x80x9cwwwxe2x80x9d standing for World Wide Webxe2x80x94the Internet""s graphical user interfacexe2x80x94and xe2x80x9cdomain.comxe2x80x9d) include an alias for the xe2x80x9cfully qualified domain namexe2x80x9d of the host. The number 80 indicates the port number on which the request is being made and is generally optional. The path to the particular file at the host is then set forth (e.g., webpages/mypage.htm).
Each fully qualified domain name, in its most generic form, includes three elements. Taking xe2x80x9ccomputer.host.comxe2x80x9d as an example, the three elements are the hostname (xe2x80x9ccomputerxe2x80x9d), a domain name (xe2x80x9chostxe2x80x9d) and a top-level domain (xe2x80x9ccomxe2x80x9d). Further, each fully qualified domain name is unique throughout the Internet and corresponds to a numerical Internet protocol (IP) address. IP addresses facilitate communications between hosts and clients in the same way that physical addresses (e.g., 123 Main Street, Anytown, Anycity) facilitate correspondence by mail. Each IP address is made up of four groups of decimal numbers separated by dots. Thus, in the case of the hypothetical host xe2x80x9ccomputer.domain.comxe2x80x9d, the corresponding IP address might be 123.255.78.91. This format is known as the dotted decimal format. A given host looks up the IP addresses of other hosts on the Internet through a system known as domain name service.
Thus, once a URL is entered into a browser, the corresponding IP address is looked up in a process facilitated by a top-level server. In other words, all queries for addresses are routed to certain computers, the so-called top-level servers. The top-level server matches the domain name to an IP address of a domain name server capable of directing the inquiry to the computer hosting the sought after Web page (or other content) by matching an alphanumeric name such as www.domain.com with its numeric IP address.
The client-server communications that take place across the Internet generally utilize a series of xe2x80x9cportsxe2x80x9d and xe2x80x9csocketsxe2x80x9d as well as IP addresses to specify communication pathways. A port is a software abstraction of a physical space through which a client and a server can send messages. Ports are known by numbers, for example port 80 is a well-known port for http communications. Several processes can use the same port at the same time. Sockets are software abstractions that provide communication links between a single server process and a single client process. Several sockets can be created on the same port. Clients and servers use input and output streams to send messages through individual sockets.
FIG. 1 illustrates an example of a conventional client-server transaction. One or more clients 10 are connected to Internet 14 through one or more routers 12. Generally, Internet Service Providers (ISPs) deploy these routers 12 at points of presence (POP) close to their respective users. Often associated with the routers 12 are caches 16. The caches act as information storage devices and generally store web pages and the like at locations that are physically and/or logically close to the ISP""s users. That way, requests for content that has been previously cached may be serviced from the cache 16, without having to make queries all the way back to an origin server 18 that may be remote from the requesting client. Using caches in this fashion allows requests to be fulfilled more quickly than would be the case if no cache were used and it also helps to reduce congestion within the Internet 14 by reducing the number of requests that must be processed by the origin server 18.
When a piece of content (e.g., a web page or the like) is requested for the first time (or for the first time in a predetermined time period, etc.), no replica of that content will be stored in cache 16. Nevertheless, the router 12 will pass the request from one of the clients 10 to the cache because such routers are generally configured by their operators to pass all requests to one or more associated caches (which may be grouped in a hierarchical fashion) before passing the request to the origin server. Where the content is not found in the cache 16, the cache 16 will fetch the content from the origin server 18.
Upon receiving a reply from the origin server 18, the router 12 will forward a copy of the content (if it is cacheable) to the cache 16 and also to the requesting client 10. This way, the cache 16 is updated so that later requests for the same content can be serviced from the cache 16 without need to query the origin server 18. This stored replica of the content may be updated periodically, depending on the refresh policies of the cache 16 and the stored content.
As mentioned above, some content is not (or should not be) cacheable. For example, content that varies depending on user input (e.g., the output of a common gateway interface (cgi) or other script) or a web page that is frequently updated at its origin server should not be cached because users will want to receive the most current version of such content. Thus in general, dynamic content should not be cached in order to avoid serving up stale information. Nevertheless, requests for such content may still be directed to the cache 16, however, because such requests are often made on well-known ports that are redirected to a cache as a matter of policy by an ISP. This will result in the user request being serviced slower than if the request were passed directly to an origin server.
A computer-implemented process is organized to recognize a request as being for a cacheable object or a non-cacheable object according to information included in a Uniform Resource Locator (URL) associated with the object. For example, the URL may include a port designation for requests for cacheable objects (e.g., images and the like). Thus, a request may be recognized as being for a cacheable or non-cacheable object according to the port on which the request is made. In some cases, requests for non-cacheable objects may be made on port 80. One benefit of this scheme is that by providing a mechanism to differentiate between cacheable and non-cacheable content, caches need not be overloaded with unnecessary traffic requesting non-cacheable content.
In another embodiment, a router may be configured to recognize a request as being for a cacheable object or a non-cacheable object according to a port on which the request is received.
In still further embodiments, Uniform Resource Locators (URLs) may be configured to identify whether or not an object associated therewith is to be cached or not. For example, the URLs may include port designations identifying objects as cacheable.
Other features and advantages of the present invention will be apparent from the following discussion.