Users of the World-Wide Web (“Web”) have discovered the benefits of simple, low-cost global access to a vast and exponentially growing repository of information, on a huge range of topics. Though the Web is also a delivery medium for interactive computerized applications (such as online airline travel booking systems), a major part of its function is the delivery of information in response to a user's inquiries and ad-hoc exploration—a process known popularly as “surfing the Web.”
The content delivered via the Web is logically and semantically organized as “pages”—autonomous collections of data delivered as a package upon request. Web pages typically use the HTML language as a core syntax, though other delivery syntaxes are available.
Web pages consist of a regular structure, delineated by alphanumeric commands in HTML, plus potentially included media elements (pictures, movies, sound files, Java programs, etc.). Media elements are usually technically difficult or time-consuming to analyze.
Pages were originally grouped and structured on Web sites for publication; recently, other forms of digital data, such as computer system file directors, have also been made accessible to Web browsing software on both a local and shared basis.
Another discrete organization of information which is analogous to the Web page is an individual email document. The present invention can be applied to analyzing email content as explained later.
The participants in the Web delivery system can be categorized as publishers, who use server software and hardware systems to provide interactive Web pages, and end-users, who use web-browsing client software to access this information. The Internet, tying together computer systems worldwide via interconnected international data networks, enables a global population of the latter to access information made available by the former. In the case of information stored on a local computer system, the publisher and end-user may clearly be the same person but given shared use of computing resources, this is not always so.
The technologies originally developed for the Web are also being increasingly applied to the local context of the personal computer environment, with Web-browsing software capable of viewing and operating on local files. This patent application is primarily focused on the Web-based environment, but also envisions the applicability of many of the petitioners' techniques to information bound to the desktop context.
End-users of the Web can easily access many dozens of pages during a single session. Following links from search engines, or from serendipitous clicking of the Web links typically bound within Web pages by their authors, users cannot anticipate what information they will next be seeing.
The data encountered by end-users surfing the Web takes many forms. Many parents are concerned about the risk of their children encountering pornographic material online. Such material is widespread. Other forms of content available over the Web create similar concern, including racist material and hate-mongering, information about terrorism and terrorist techniques, promotion of illicit drugs, and so forth. Some users may not be concerned about protecting their children, but rather simply wish themselves not to be inadvertently exposed to offensive content. Other persons have managerial or custodial responsibility for the material accessed or retrieved by others, such as employees; liability concerns often arise from such access.