The approaches described in this section could be pursued, but are not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated herein, the approaches described in this section are not prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.
Some network resources such as Web sites are configured by malicious or dishonest persons to host harmful computer program code, or to contain forms or applications that seek to collect personal identifying information or financial account information for unauthorized purposes. The persons who control such sites often seek to entrap unsuspecting users into giving up personal financial information by sending electronic mail (e-mail) messages to the users that appear to originate from legitimate entities, and contain hyperlinks to the malicious or dishonest sites. Network security analysts use the term “phishing” to describe such approaches.
Other e-mail senders dispatch to enterprise end users messages containing hyperlinks to Web sites or other network resources that the end users are not allowed to access according to enterprise policy. Such sites may include pornographic material, streaming audio or video content that consumes excessive enterprise network bandwidth, or other material for which the enterprise prefers to control access. The messages may be unsolicited, but need not be.
Hypertext transfer protocol (HTTP) and simple mail transfer protocol (SMTP) are defined in Internet Engineering Task Force (IETF) Request for Comments (RFC) 2616 and RFC 2821. The reader of this document is presumed to be familiar with RFC 2616, RFC 2821, and the structure of an HTTP request, a URL, a hyperlink, and an HTTP proxy. Generally, an HTTP request is an electronic message that conforms to HTTP and that is sent from a client or server to another server to request a particular electronic document, application, or other server resource. An HTTP request comprises a request line, one or more optional headers, and an optional body. A URL identifies a particular electronic document, application or other server resource and may be encapsulated in an HTTP request. A hyperlink is a representation, in an electronic document such as an HTML document, of a URL. Selecting a hyperlink invokes an HTTP element at a client and causes the client to send an HTTP request containing the URL represented in the hyperlink to an HTTP server at, and identified by, a domain portion of the URL.
In conventional operation of hypertext transfer protocol (HTTP), electronic documents prepared using hypertext markup language (HTML) may contain hyperlinks to other documents or network resources. A user views the HTML documents using a browser, such as Firefox, Netscape Navigator, or Microsoft Internet Explorer. When a user selects a hyperlink for a network resource in an HTML document, the browser issues an HTTP GET or POST request to the HTTP server that hosts the linked resource. Before dispatching the request, the browser places the uniform resource locator (URL) of the current HTML document in a “Referer” header in an HTTP request. The receiving HTTP server can use the Referer value to learn how the user reached the HTTP server. If the browser is configured to send HTTP requests to an HTTP proxy server rather than directly to the server that hosts the linked resource, then the HTTP proxy server can examine the contents of the Referer field to determine whether the user should be allowed to obtain the requested network resource.
A browser may display objects other than HTML documents. Browsers may request URLs for reasons other than in response to a user selecting a hyperlink in an HTML document. A browser does not always need to be explicitly configured to communicate with a proxy; some proxies can be configured transparently.
However, when a hyperlink or URL appears in an e-mail message, which a user has received legitimately or as part of a phishing attack, selecting the hyperlink does not result in placing information in the Referer field. Because the hyperlink or URL is in an e-mail message, which is not an HTML document or Web site, no URL identifying the e-mail message can be formed and placed in the Referer field. As a result, in current practice there is no way to control access to hyperlinks or URLs that appear in an e-mail based on the origin of that e-mail. Consequently, e-mail systems are vulnerable to phishing attacks and other problems.
Email messages that are displayed by browsers in so-called “webmail” clients present a special case of the foregoing general description. In a webmail client, an email message is displayed as part of an HTML document or HTML fragment. When the user selects a hyperlink in a webmail client, the browser sends a referer header, but the browser does not include information from the sender about the sender of the email. Instead, the browser typically places in the referer header a URL that is derived from the implementation of the webmail client.
Based on the foregoing, there is a clear need in the data processing field for a method that permits controlling access to Web sites and other network resources that are referenced within e-mail messages using URLs or hyperlinks.
More generally, electronic mail (email) messages often contain URLs that are presented to users in email client software. When a user clicks on a URL that is displayed in the email client, the email client typically communicates the URL to the user's preferred web browser and the web browser initiates an HTTP request for the URL and renders the resulting response. It is quite common for email from unknown, disreputable sources to contain URLs that point to web sites that attempt to harm end user computers using spyware, adware, malware downloads or other techniques. There is a need for a technique that a network security device, devices, or software systems can use to prevent or control access to such URLs.