1. Technical Field
The present invention relates generally to an approved data processing system and in particular to managing information stored in a data processing system. Still more particularly, the present invention relates to a method and apparatus for managing pages retrieved by a browser.
2. Description of Related Art
Internet, also referred to as an xe2x80x9cinternetworkxe2x80x9d, in communications is a set of computer networks, possibly dissimilar, joined together by means of gateways that handle data transfer and the conversion of messages from the sending network to the protocols used by the receiving network (with packets if necessary). When capitalized, the term xe2x80x9cInternetxe2x80x9d refers to the collection of networks and gateways that use the TCP/IP suite of protocols.
The Internet has become a cultural fixture as a source of both information and entertainment. Many businesses are creating Internet sites as an integral part of their marketing efforts, informing consumers of the products or services offered by the business or providing other information seeking to engender brand loyalty. Many federal, state, and local government agencies are also employing Internet sites for informational purposes, particularly agencies which must interact with virtually all segments of society such as the Internal Revenue Service and secretaries of state. Operating costs may be reduced by providing informational guides and/or searchable databases of public records online.
Currently, the most commonly employed method of transferring data over the Internet is to employ the World Wide Web environment, also called simply xe2x80x9cthe webxe2x80x9d. Other Internet resources exist for transferring information, such as File Transfer Protocol (FTP) and Gopher, but have not achieved the popularity of the web. In the web environment, servers and clients effect data transaction using the Hypertext Transfer Protocol (HTTP), a known protocol for handling the transfer of various data files (e.g., text, still graphic images, audio, motion video, etc.). Information is formatted for presentation to a user by a standard page description language, the Hypertext Markup Language (HTML). In addition to basic presentation formatting, HTML allows developers to specify xe2x80x9clinksxe2x80x9d to other web resources identified by a Uniform Resource Locator (URL). A URL is a special syntax identifier defining a communications path to specific information. Each logical block of information accessible to a client, called a xe2x80x9cpagexe2x80x9d or a xe2x80x9cweb pagexe2x80x9d, is identified by a URL. The URL provides a universal, consistent method for finding and accessing this information by a web xe2x80x9cbrowserxe2x80x9d. A browser is a program capable of submitting a request for information identified by a URL at the client machine. Retrieval of information on the web is generally accomplished with an HTML-compatible browser, such as, for example, Netscape Communicator, which is available from Netscape Communications Corporation.
When a user desires to retrieve a document, such as a web page, a request is submitted to a server connected to a client computer at which the user is located and may be handled by a series of servers to effect retrieval of the requested information. The selection of a document is typically performed by the user selecting a hypertext link. The hypertext link is typically displayed by the browser on a client as a highlighted word or phrase within the document being viewed with the browser. The browser then issues a hypertext transfer protocol (HTTP) request for the requested documents to the server identified by the requested document""s URL. The server then returns the requested document to the client browser using the HTTP. The information in the document is provided to the client formatted according to HTML. Typically, browsers on personal computers (PCs) along with workstations are typically used to access the Internet. The standard HTML syntax of web pages and the standard communication protocol (HTTP) supported by the World Wide Web guarantee that any browser can communicate with any web server.
Web pages retrieved by a browser are stored or cached by the browser on the data processing system on which the browser is running. In addition, the browser maintains a list of pages received by a user. This list is also referred to as a history list. A subdirectory under the main program directory of the browser is typically set up to cache visited pages. Caching is a quick way to speed up access to the World Wide Web by storing the pages on a hard disk. By having the page data stored locally, the browser can access the page right from the computer rather than waiting for it to download from the Internet. As a result, the next time a page is accessed that was visited previously, the page loads quickly from the hard disk on the computer. The browser also caches any Java applet class files (byte code) which was contained on pages that were visited. In addition, once a browser window displays an encrypted page, the disk cache retains an unencrypted copy of the page in an unencrypted form. Anyone having access to the disk cache can view the contents of the page.
The history list in a browser offers a convenient means of redisplaying pages that were previously viewed. Unlike bookmark lists, which store page locations that were designated by a user, history items are saved automatically when a page is displayed by a browser. From the history list, previously viewed pages may be viewed at a later time without being connected to the Internet. On Windows and Unix browsers, the history window displays a page""s title, URL, first visited date, last visited date, expiration date, and number of visits.
Thus, anyone who is able to access the cache or history list for a browser will be able to view pages retrieved by a user, including encrypted pages. This ability to view retrieved pages and the history list creates a privacy and security concern for many users who receive or view confidential or encrypted documents that have been retrieved from the Internet. This concern may be partially alleviated through the use of various security mechanisms available to restrict access to the user""s computer. However, users will often access different computers when performing various tasks. In addition, in commercial environments, more applications are beginning to use interfaces that involve browsers or browser type applications to make transfers across the Internet. With multiple users having access to the same computer, the concerns of confidentiality and security may not be alleviated as easily by known security mechanisms because a user that is allowed to access the computer may not have the same privileges with respect to the information retrieved by other users of the same computer.
Therefore, it would advantageous to have an improved method and apparatus for managing information retrieved by browsers.
The present invention provides a method and apparatus in a data processing system for selectively caching web information in a cache for a browser. Web content is first retrieved by the browser during a browsing session. The web content is parsed for an indication of how the page should be stored. The web content is then stored using the indication. For example, retrieved web content may be parsed for an indication that the web content is to be removed after the browsing session terminates. Responsive to identifying the indication, the web content is cleared from the cache in response to the browsing session terminating.