A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
The present application relates generally to use of a computer with the Internet and, more particularly, methods for speeding up the process of browsing Web content in a computer system having an Internet or other on-line browser.
With the ever-increasing popularity of the Internet, particularly the World Wide Web (xe2x80x9cWebxe2x80x9d) portion of the Internet, more and more personal computers (PC""s) provide Internet access to vast stores of information through Web xe2x80x9cbrowsersxe2x80x9d (e.g., Microsoft Internet Explorer or Netscape Navigator) or other xe2x80x9cInternet applications.xe2x80x9d Browsers and other Internet applications includes the ability to access a URL (Universal Resource Locator) or xe2x80x9cWebxe2x80x9d site. The URL is used to specify the location of a file held on a remote machine.
Each URL itself is composed of several distinct components. For example, the URL http://host/file.html includes three distinct components. The first component, http, specifies the protocol (here, xe2x80x9cHTTPxe2x80x9d or HyperText Transfer Protocol) that is to be used to access the target file. Other access protocols can be specified by a URL. For example, the URL of ftp://ftp.pgp.com/pub/docs/samples specifies access to files via xe2x80x9cFTPxe2x80x9d (File Transfer Protocol). This specifies a link for accessing the file directory docs/samples on the machine ftp.pgp.com.
The second component, host, indicates the name of the remote machine; this can be expressed as either a symbol name (e.g.,pgp.com) or a numeric IP (Internet Protocol) address such as 123.200.1.1. The final component, file.html, provides the path name of the target filexe2x80x94that is, the file which the hypertext link is to be made. The file is referenced relative to the base directory in which Web pages are held; the location of this directory is specified by the person who has set up the Web server (i.e., xe2x80x9cWebmasterxe2x80x9d).
The majority of content available on the Internet is represented in xe2x80x9cHTMLxe2x80x9d documents which, in turn, are read or accessed by Web browsers. In particular, the HTML or Hypertext Markup Language is the scripting language used to create the documents for the World Wide Web. Although most browsers will display any document that is written in plain text, HTML documents afford several advantages. In particular, HTML documents include formatting, graphics, and xe2x80x9chypertext linksxe2x80x9d to other documents.
Markup languages are used to describe the structure of the document. HTML is used to mark various elements in a document, including headings, paragraphs, lists, tables, and the like. To achieve this, an HTML document includes formatting commands or xe2x80x9ctagsxe2x80x9d embedded within the text of the document which serve as commands to a browser. Here, HTML tags mark the elements of a file for browsers. Elements can contain plain text, other elements, or both. The browser reading the document interprets these markup tags or commands to help format the document for subsequent display to a user. The browser thus displays the document with regard to features that the viewer selects either explicitly or implicitly. Factors affecting the layout and presentation include, for instance, the markup tags used, the physical page width available, and the fonts used to display the text.
The design of HTML tags is relatively simple. Individual HTML tags begin with a  less than (xe2x80x9cless thanxe2x80x9d) character and end with a  greater than  (xe2x80x9cgreater thanxe2x80x9d) character, such as  less than title greater than  which serves to identify text which follows as the title of a document. HTML tags are not case-sensitive (with the exception of HTML escape sequences) and are often used in symmetric pairs, with the final tag indicated by the inclusion of a / (slash) character. For instance, the  less than title greater than  tag represents a beginning tag which would be paired with a  less than /title  greater than  ending tag. These paired commands would thus be applied to the text contained within the beginning and ending commands, such as  less than title greater than  My Sample Title  less than /title  greater than . The  less than B  greater than  tag, on the other hand, informs browsers that the text which follows is to be in bold type. This bolding is turned off by the inverse markup tag  less than /B  greater than . In contrast to these paired or xe2x80x9ccontainerxe2x80x9d tags, separator tags are used unpaired. For example, the command  less than br greater than  is employed by itself to insert a line break. Browsers generally ignore extra spaces and new lines between words and markup tags when reading the document. In other words, xe2x80x9cwhite spacexe2x80x9d characters, such as tabs, spaces, and new line characters, are generally ignored in HTML. Leaving a blank line in one""s document, for instance, generally does not create a blank line when the document is displayed in a browser, unless one uses the xe2x80x9cpreformattedxe2x80x9d HTML tag ( less than pre greater than  and  less than /pre greater than ). Finally, not all tags are supported by all Web browsers. If a browser does not support a tag, it (usually) just ignores it.
The attraction of the World Wide Web is of course the xe2x80x9crichxe2x80x9d content which it stores, largely as a collection of these interconnected Web or HTML pages. With each passing day, the information content available on the Web is more and more graphical in nature (e.g., high use of bitmaps). Accompanying the explosive growth of the World Wide Web, for instance, is the ever increasing use of advertising material on practically any content which a user can access. This is particularly problematic since advertising material is often graphically intensive, requiring substantial time and resources for downloading and processing. Apart from advertising, many Web sites employ graphics to such an extreme degree as to render it difficult or impractical to access the Web site in real-time unless one has a high-speed Internet connection (e.g., T1 line). All told, the total download times for Web pages is becoming increasingly greater.
At the same time, the underlying infrastructure of the Web has not improved to a sufficient degree to offset this increased resource demand. Although advertising on the Web serves as one example, there exists a more general problem of how a user of the Web can exert at least some control over the content which is downloaded into his or her browser. Accordingly, there is great interest in developing techniques which speed up the process of browsing Web content or xe2x80x9cWeb surfing,xe2x80x9d including decreasing the background noise (e.g., ancillary graphics) which are not desired by the user.
An Internet computer system with methods for dynamic filtering of hypertext tags and content is described. The system includes one or more Web clients, each with an Internet connection to one or more Web servers. An exemplary Web client comprises a personal computer or workstation operating a Web browser (e.g., Netscape Navigator or Microsoft Internet Explorer) which communicates with the Internet via a communication layer, such as Microsoft Winsock (Winsock.dll)xe2x80x94a Windows implementation of TCP/IP Transmission Control Protocol/Internet Protocol (TCP/IP).
At each client, interposed (functionally) between the browser and the communication layer is a Filter module of the present invention. In this fashion, the Filter module can trap and process all communications between the browser and the communication layer. For a client employing a Winsock 2 communication driver, the Filter module can register itself with the Winsock driver directly and, thereby, provide trapping and processing of communication in a manner which has the native support of the driver.
The Filter module, which implements client-side methodology at each individual Web client for dynamic filtering of hypertext tags and content, includes an output stream, a processing loop, a Filter method, and an input stream. For assisting with user configuration of its operation, the Filter module also includes a graphical user interface (GUI) administration module. The input stream is responsible for getting input; it interfaces directly with the Winsock communication driver. In a corresponding manner, the output stream communicates with the (client) browser; it is responsible for providing output to the browser which is to be ultimately rendered on screen for the user. Accordingly, the output stream represents the data pool right before it is sent to the browser. The Filter method, on the other hand, represents the workhorse method or core logic for performing the filtering.
At a high level, the Filter module operates as follows. The Web browser retrieves content by generating requests for content. More particularly, a fetch or GET request or command (e.g., using HTTP protocol) is issued through the Winsock communication driver, for example, for fetching particular content (e.g., bitmaps) specified by a Web page. The command is, however, first trapped by the Filter module. The xe2x80x9crealxe2x80x9d request or command is at this point processed by the Filter method. At the level of the Filter method, the system can modify the command, delete the command, synthesize new commands, or pass through unchanged the existing command, thereby impacting how the system renders Web pages. In an exemplary embodiment, the Filter method provides handlers for specific processing of various HTML tags, all operating according to user-configurable filtering preferences.