The Internet, which in essence includes a large number of networked computers distributed throughout the world, has become an extremely popular source of virtually all kinds of information. Increasingly sophisticated computers, software, and networking technology have made Internet access relatively straightforward for end users. For example, conventional browser software allows a user to request information such as a web page from a web site on one or more remote computers. To this end, the user provides the address of the web page (e.g., a uniform resource identifier, or URI) in some manner to the browser software, and the browser software transmits the request using a well known communication protocol such as the HyperText Transport Protocol (HTTP). The request is then routed to the destination computer or web site based on the address.
When the request is received, the remote web site evaluates the request and returns an appropriate response, which may include the requested information in some markup language, e.g., a HyperText Markup Language (HTML), or similarly formatted content. The browser software interprets the returned content to render a page or the like upon the user's computer display.
As part of handling the returned page that is loaded into a browser instance, the content that makes up that page may be associated with a security zone, such as described in U.S. patent application Ser. No. 09/055,772, assigned to the assignee of the present application and issued as U.S. Pat. No. 6,366,912 on Apr. 2, 2002. The association of the content with a security zone is based on the content's source, such as the web site from which it was obtained, e.g., as determined from its URL (Uniform Resource Locator, or URI, Uniform Resource Identifier). For example, a URL starting with “http//:www” may be associated with an Internet security zone.
A security zone corresponds to a set of security settings for that content, i.e., for each security zone there are a set of security settings, comprising, for example, locally maintained data specifying rules for possible actions, such as whether or not any script in the content can be run, whether specified controls (e.g., ActiveX® controls) may be loaded, and so on. Note that security settings can specify more than yes or no decisions, such as a setting indicating that the user should be prompted for a decision at the time that a decision (e.g., whether to download) needs to be made.
When an action needs to be taken, the security settings corresponding to a page are accessed to determine whether to allow or deny the action, prompt the user or the like, or take some other action (e.g., consult some other data to make the determination). In any event, the security zone thus controls what actions received content can and cannot perform. Browser software may ship with default settings for the zones, and the user may adjust them from there, or obtain them from some other source. For example, the settings may be such that content downloaded from the Internet is allowed to do less (at least without prompting) than content loaded from an Intranet, e.g., content received from the Intranet-zone may be automatically allowed to load an ActiveX® control, while similar content received from the Internet zone may require the user to decide via a prompt whether or not such a control can be loaded.
In addition to Internet and Intranet zone distinctions, the user may specify in some manner (e.g., via a browser user interface) that particular URLs corresponding to certain web sites are trusted, thus giving them trusted (less restrictive) security settings. On the other end, a user may specifically assign URLs of untrusted sites to a restricted zone, having very restrictive security settings. When content is received from such a site, the content is associated with the trusted or restricted security settings as appropriate, rather than the Internet or Intranet settings. Thus, for example, a reputable site can be specifically identified as trusted, so that it will get trusted security settings instead of falling via its URL into the Internet security zone. At the same time, a website characterized as restricted will not be associated with the Internet zone settings, for example, but will get the more restrictive settings of the restricted security zone. One other security zone corresponds to the local computer, e.g., files that do not have a corresponding URL, but rather are present in local storage such as on a disk drive, can have other security settings, which are generally very unrestricted. Note that the local computer zone does not apply to cached Internet and Intranet content, (with URLs), so that zone security is not defeated simply because that content happened to be cached and was thus recalled locally instead of remotely. Still other zones are feasible.
While security zones thus provide security in accordance with default settings and/or a user's preferences, when a web site points to content that is on that same site, and loads that content in a frame, the content that is loaded in the frame has the same security privileges that the top level document is given, since the security zone is set for the top level document and the domain does not change with this pointed-to content. In one example scenario in which content contains HTML and script, this causes a problem when the content is only stored on that site, but does not belong to the site owner, since the site owner typically does not have a way to verify that loading this HTML and script combination in a user's machine would not pose any security risks.
By way of example of this problem, consider a site such as Hotmail. Hotmail is a popular electronic mail service which users access via the Internet to receive and send messages. When a user uses Hotmail, the user navigates to a site that (ordinarily) corresponds to the Internet security zone. Because Hotmail requires script to function, to use Hotmail, execution of script needs to be enabled in the Internet security zone.
In operation, Hotmail collects user credentials, and loads an e-mail message in a frame. Since that message is in the same domain with Hotmail's top level page, there is no cross-domain security applied to that frame. As a result, if that frame contained HTML content, then any script therein will run, and can access the parent document, since they are on the same domain and Hotmail needs scripting to be enabled. Thus, if Hotmail allowed it, the HTML inside the e-mail message could include malicious content, such as content comprising script that steals the user's credentials when run.
To solve this problem, before Hotmail provides access to an e-mail message, Hotmail parses each email message, and strips off all style tags, script and anything that is interactive with the e-mail message, thereby preventing malicious content from causing harm. However, as a consequence, the end user does not see any proper formatting and style, even when reading an HTML-based message, since the Hotmail server has to remove such content. In general, although users may trust the site itself, users lose many of the benefits of HTML in order to protect against its possible misuse by other's content made available through that site.