1. The Field of the Invention
The present invention relates to filtering electronic content and, more particularly, to filtering cached content based on embedded URLs
2. Background and Relevant Art
Computer systems and related technology affect many aspects of society. Indeed, the computer system's ability to process information has transformed the way we live and work. Computer systems now commonly perform a host of tasks (e.g., word processing, scheduling, and database management) that prior to the advent of the computer system were performed manually. More recently, computer systems have been coupled to one another and to other electronic devices to form both wired and wireless computer networks over which the computer systems and other electronic devices can transfer electronic data. As a result, many tasks performed at a computer system (e.g., voice communication, accessing electronic mail, controlling home electronics, web browsing) include electronic communication between a number of computer systems and/or other electronic devices via wired and/or wireless computer networks.
More particularly, web browsing has become a common mechanism for accessing electronic content. To access electronic content, a user of a Web browser can enter a Uniform Resource Location (“URL”) into a field or select a presented URL link at the Web browser user-interface. Selecting a link can include, for example, selecting a link from a list of favorites or selecting a link from within currently displayed content. In any event, the URL (typically a string of text) is sent to a Domain Name Service (“DNS”) and resolved into an electronic address. The Web browser then sends a request for electronic content (e.g., a Web page) to the electronic address. Upon receiving the request, a Web server at the electronic address replies by sending the requested electronic content to (e.g., the electronic address of) the Web browser.
HyperText Transfer Protocol (“HTTP”) URLs used to access Web-based content can include a number of different portions. A scheme portion of a URL identifies the URL as corresponding to the HTTP protocol. A host portion of a URL identifies a fully qualified domain name or Internet Protocol (“IP”) address of a network host, such as, for example, a computer system (or group of computer systems). A path portion of a URL identifies a path to a specified resource at the network host, such as, for example, to particular content controlled by the computer system identified in the host portion.
Thus, any user that has a URL can request corresponding electronic content from a Web server. On the Internet, this request/reply mechanism is advantageous, since users are provided efficient access to large amounts of diverse electronic content. Users can easily access electronic content on a variety of topics, such as, for example, sports, technology, medicine, etc. However, access to some forms of electronic content, such as, for example, gambling and adult content, may not be appropriate for some users (e.g., children) and/or in some environments (e.g., in the work place).
Accordingly, Web filtering mechanisms have been developed to block electronic content, for example, based on a domain or URL associated with the electronic content. Web filtering mechanisms typically place domains and/or URLs into content categories (e.g., sports, legal, technology, news, etc.). An administrator can then assign user access rights to each content category. For example, the administrator of can configure a Web filtering product (a desktop computer, gateway, caching device, firewall, etc.) to permit or block user access to content categories. Access rights to particular content categories can be based on personal or organizational Internet access policies. For example, an organizational policy can require blocking access to gambling and adult content sites, while allowing access to all other sites.
Search engines are utilized extensively in Web browsing to automatically and quickly identify links to relevant content. At times, a search engine may be the only efficient mechanism for finding content related to a particular subject. As a result, filtering products often group search engines in a specific search engine category and typically do not block the search engine category. Thus, users are typically permitted to utilize search engines to search for content.
Search engine results typically include a list of accessible links (representing URLs) to relevant content. To access the relevant content, a user accesses a presented link and the corresponding content is returned. However, accessing a presented link causes filtering mechanisms to check the content category of the domain and/or URL of the link. If access to content in the content category is not permitted, the returning content is blocked. Thus, even if a user performs a search for blocked content and is returned a link, the user is typically prevented from actually accessing the blocked content.
Search engines can maintain cached versions of other Web sites to provide access to older copies of the other Web sites, for example, when the other Web sites are not available or off-line when a search is performed. Thus, the results of a Web search can include links to cached Web pages as well as current Web pages. Accordingly, by maintaining cached content, search engines can provide search results that include links to at least some relevant content, even if the relevant content is not the most recent content.
In some environments, search engines provide search engine functionality and access to cached content at a URL having the same host portion. That is, the host portion of the URL identifies a computer system (or group of computer systems) providing search engine functionality and providing access to cached content. For example, a search engine server and storage server may be identified by the same host portion.
Unfortunately, since filtering mechanisms typically prevent access to content based on domain or URL and since cached content is sent from a URL corresponding to the search engine, it is possible to circumvent filtering mechanisms by accessing cached content. That is, aside from the URL used to access the cached content, there is essentially no way for filtering mechanisms to determine what content is being access. Thus, when a search engine caches potentially undesirable content and then provides access to the potentially undesirable content at a search engine URL, it may be difficult, if not impossible, to block the undesirable content without also preventing access to the search engine functionality.
In other environments, search engines provide search engine functionality at a first URL having a first host portion and provide access to cached content at a second URL having a second different host portion. That is, the first URL identifies a first computer system (or first group of computer systems) providing search engine functionality and the second URL identifies a second different computer system (or second different group of computer systems) providing access to cached content. For example, a first URL with a first host portion can identify a search engine server and a second URL with a second different host portion can identify a storage server.
In these other environments, it may be possible to block access to undesirable cached content, while still providing access to search engine functionality. That is, a filtering mechanism can allow access to content at a URL providing search engine functionality and deny access to content at a URL providing access to cached content. Unfortunately, preventing access to content at a URL providing access to cached content prevents access to all content at the URL, including content that is not otherwise blocked by filtering mechanisms. For example, blocking all cached content from a URL can cause cached news content to be blocked even when the news content category is not blocked.
Accordingly, administrators are often forced to allow access to undesirable cached content (to allow access to search engine functionality or other desirable cached content), even when filtering mechanisms are configured to otherwise block similar non-cached content. Therefore systems, methods, and computer program products that facilitate more intelligent filtering of electronic content would be advantageous.