The present invention relates to the field of network browsing software and, in particular, to methods and systems for secure viewing and execution of Web pages saved (i.e. downloaded) from the Internet.
In recent years, there has been a tremendous proliferation of computers connected to a global network known as the Internet. A xe2x80x9cclientxe2x80x9d computer connected to the Internet can download digital information from xe2x80x9cserverxe2x80x9d computers connected to the Internet. Client application and operating system software executing on client computers typically accepts commands from a user and obtains data and services by sending requests to server applications running on server computers connected to the Internet. A number of protocols are used to exchange commands and data between computers connected to the Internet. The protocols include the File Transfer Protocol (FTP), the Hyper Text Transfer Protocol (HTTP), the Simple Mail Transfer Protocol (SMTP), and other protocols.
The HTTP protocol is used to access data on the World Wide Web, often referred to as xe2x80x9cthe Web.xe2x80x9d The World Wide Web is an information service on the Internet providing documents and links between documents. The World Wide Web is made up of numerous Web sites around the world that maintain and distribute Web documents. A Web site may use one or more Web server computers that are able to store and distribute documents in one of a number of formats including the Hyper Text Markup Language (HTML). An HTML document can contain text, graphics, audio clips, and video clips, as well as metadata or commands providing formatting information. HTML documents also include embedded xe2x80x9clinksxe2x80x9d that reference other data or documents located on the local computer or network server computers.
A Web browser is a client application, software component, or operating system utility that communicates with server computers via FTP, HTTP, and Gopher protocols. Web browsers receive Web documents from the network and present them to a user. Internet Explorer, available from Microsoft Corporation, of Redmond, Wash., is an example of a popular Web browser.
An intranet is a local area network containing Web servers and client computers operating in a manner similar to the World Wide Web described above. Additionally, on an intranet a Web browser can retrieve files from a file system server executing on the same computer as the Web browser, or on a remote computer on the local area network. A Web browser can retrieve files on the local area network using the xe2x80x9cFILExe2x80x9d protocol, which comprises file system commands. Typically, all of the computers on an Intranet are contained within a company or organization. Many intranets include a xe2x80x9cfirewallxe2x80x9d that functions as a gateway between the intranet and the Internet, and prevents unauthorized people from breaking into the computers of an organization. A xe2x80x9cproxy serverxe2x80x9d is one well-known type of firewall.
In addition to data and metadata, HTML documents can contain embedded software components containing program code that perform a wide variety of operations. As used herein, the term software components refers to binary objects or programs that perform specific functions and are designed in such a way to easily operate with other components and applications. These software components expand the interactive ability of an HTML document""s user interface. The components can perform other operations, such as manipulating data and playing audio or video clips. Example software components are ActiveX(copyright), Java, JavaScript, and VBScript; however, other embedded software components can and do exist. ActiveX(copyright) is a specification developed by Microsoft Corporation for creating software components that can be embedded into an HTML document. Java is a well-known programming language that can be used to develop components called xe2x80x9capplets,xe2x80x9d which are transmitted with HTML documents from Web servers to client computers. JavaScript and VBScript are scripting languages that are also used to extend the capabilities of HTML. JavaScript and VBScript scripts are embedded in HTML documents. A browser executes each script and/or software component as it reaches the position in the script during interpretation of the HTML document.
Some Web pages on the Internet contain software components that perform operations that are not desired by a user. This may occur either because a component developer intentionally programmed the component to perform a malicious operation, or because a xe2x80x9cbugxe2x80x9d in the software causes the component to perform an unintended or malicious operation. One way in which browsers have addressed the problem of undesirable operations being performed is by use of security zones. Security zones are similar to visas that some countries issue to travelers. If the country trusts you, they stamp your passport so you can travel anywhere you like during your visit. If for some reason the country does not completely trust you, it strictly limits where you can go and what you can do during your stay.
Security Zones work the same way as visas, except that the user is in the role of the country deciding how much access to allow to visitors to the user""s computer. Web sites that the user trustsxe2x80x94such as those on the user""s intranet or from established companies in whom the user has confidencexe2x80x94can be designated as trusted, allowing them to run as much powerful, active content on the user""s computer as desired. Sites that the user is not as sure about, can be assigned a different zone classification from which the user can strictly limit access to the user""s computer.
Version 4.0 of Microsoft Corporation""s Internet Explorer is an example of a commercially available program that includes the concept of security zones. Under this concept, each Web page belongs to exactly one security zone, and each zone defines a set of permissions for Web pages that reside in that zone. For example, these permissions control whether to run JavaScript and VBScript scripts contained in the Web page. In addition, these permissions enable or disable downloaded software in the form of Java or ActiveX(copyright) controls. Zone classification is based on the Web page""s uniform resource locator (URL). Thus, each security zone grants a separate set of permissions to Web pages located in the security zone.
Additional security restrictions are imposed by the Web browser to constrain interaction of Web pages that are joined together to form HTML framesets. An HTML frameset consists of a collection of frames that allow creation of multiple document windows within one browser. Each frame appears to act like a separate browser window, displaying multiple information sources simultaneously. Within each frame a user can scroll up and down, and perform all the things that a user would normally do within a single browser window. Frames allow HTML programmers to create complex documents that present information in a useful manner. The links in a frame can control what is displayed in other frames or windows. This enables programmers to create indices or quick tabs that allow easier navigation through a single document or groups of documents. For example, selecting a link in the index frame could cause a different page to appear within another frame. As used herein, the term xe2x80x9cdomainxe2x80x9d means a group of computers and devices on a network that are administered as a unit with common rules and procedures. Within the Internet, domains are defined by their IP address. All devices sharing a common part of the IP address are said to be in the same domain. In Version 4.0 of Microsoft""s Internet Explorer, pages of an HTML frameset may only interact if the domain components of their URLs refer to the same domain. Thus, the security mechanisms of Internet Explorer apply equally to both zones and cross-domain access (i.e. cross-frame or frameset interactions), because both are based on URLs; however, the specific benefits are separatexe2x80x94correct permissions granted in zones and correct range of access allowed for domains.
However, a security risk may be created when a user saves (i.e. downloads) a Web page from the World Wide Web to his or her local hard drive or other local storage media. More particularly, the saved Web page is stored on the user""s hard drive and is thus in the user""s local machine zone, which is typically granted low security. Accordingly, the saved Web page enjoys the most liberal set of permissions granted by the security zone system. This exposes the user to a security risk when he or she loads the Web page back into the Web browser from his or her local storage medium. With the enhanced local machine permissions, the Web page and/or HTML framesets contained therein would be capable of running active content scripting such as JavaScript and VBScript, software controls such as Java or ActiveX(copyright) controls, and cross-frame scripting in between the HTML framesets, that would have been prohibited in the Web page""s original security context.
For example, consider a Web page that, based on its URL, is classified in a High Security Zone. The Web page is classified in this zone because the user does not trust the Web site. Thus, because the Web page is classified in the High Security Zone, the Web browser will not run active content scripts, cross-frame scripts or software controls, that could potentially damage the user""s computer. The problem arises if the user saves this Web page to his or her local hard drive, inside the user""s firewall, or to any location in a less secure zone. The Web page was previously in a High Security Zone; however, since the URL for the Web page has changed, the Web page is now classified as being located in a Low Security Zone. In other words, the original security zone classification is not preserved. Consequently, the Web browser is free to execute potentially harmful active content scripts, cross-frame scripts and/or software controls the next time the Web page is viewed or executed off of the local hard drive. Indeed, the scripts and/or software controls need not display a user interface or message at the time of execution; consequently, harm could be caused to the user""s system without the user even knowing about it.
In addition to the acquisition of new permissions, the content of the Web page would then be able to access content (cross-domain) in the saved domain. Further, another potential problem is that scripts can check for the Web page""s current URL (i.e. location) and only execute harmful attack code when the page is executed from a local hard drive, thereby avoiding security warnings that would have been raised if the Web page had been executed in its original Web location in the Internet zone. Thus, saving the Web page to the local hard drive can create potential security risks. In sum, any new URL of a different zone/domain affords new permissions/access to the content.
Although the above security risk may be created when a Web page is saved to a local hard drive or other local storage media, the skilled artisan will understand that this risk is not present during traditional Web page viewingxe2x80x94despite the fact that the content of the Web page was downloaded from the Internet into the cache of the user""s computer. Web pages in the cache do not suffer the security zone transition problem because they are accessed through their original URLs, and hence retain their original security zone classification. Thus, the primary problem solved by the present invention is how to retain a file""s original security zone classification whenever the file is moved from one security zone to another.
Accordingly, it is an object of the present invention to provide an improved method and security system that allows secure viewing and execution of Web pages downloaded from the Internet.
The saved web page security system of the present invention includes a computer-readable medium with computer-executable components. One of the computer-executable components is a storage component. The storage component stores information about an electronic document, such as a Web page, when the electronic document is saved to the computer-readable medium. The other computer-executable component is a security component. The security component uses the stored information to facilitate secure viewing or execution of the downloaded electronic document. In an alternate embodiment in which the electronic document crosses zone boundaries, the present invention utilizes the above principals to facilitate secure viewing or execution of the document regardless as to whether the document is downloaded or whether the document is saved to a particular computer-readable medium.
Under the saved web page security method of the present invention, a Web page is downloaded from the Internet to a computer-readable medium. The Internet address for the Web page is stored on the computer-readable medium. When the Web page is opened from the computer-readable medium, the Internet address is used to identify a security context for the Web page.