An Internet object is anything that is downloaded or transmitted via the Internet, including but not limited to web pages, images, text documents, email messages, newsgroup postings, chat text, video, and audio. Given the tremendous increase in the size and variety of the Internet community in recent years, classification of Internet objects has become increasingly important, and manual classification of Internet objects is becoming increasingly inadequate.
Internet objects can be classified by the subject matter they contain. One practical application of such classification is the ability to filter Internet objects based on classification. There is particular interest in classifying Internet objects containing adult content because access to such Internet objects can then be restricted to prevent viewing by minors.
Internet filtering products have previously been developed to attempt to filter Internet objects based on adult content. All filtering products require a method by which to classify Internet objects. The prior art methods of classification are detailed below, but can be summarized as taking one of three approaches: (i) filtering based on classification information embedded in an Internet object; (ii) compilation of “blacklists” and/or “whitelists” that filtering products may reference; and (iii) real-time textual analysis.
One Internet filtering product is produced by Netscape as part of the Netscape web browser. The browser includes an adult content filtering feature that classifies web pages based on the PICS labeling system, a voluntary system by which Internet content providers include special codes in their Internet objects (in this case web pages) to describe the content of the object. The PICS labels are the only mechanism used by Netscape in classifying the adult content of web pages. The PICS system is described in greater detail at http://www.w3.org/pics/. The drawback in relying solely on the PICS label to classify Internet objects is that not all sites are classified using the system, as participation in the PICS system is voluntary. In addition, there is no independent verification process, and users rely solely on the judgement of the Internet content providers, which may be biased by self-interest.
Cyber Patrol is another Internet filtering product. Cyber Patrol maintains a “blacklist” of web sites that are considered to contain adult content. The Cyber Patrol web page at http://www.cyberpatrol.com/discloses that “professional researchers compile” the lists, apparently manually. With the current growth rate of Internet users and Internet content providers, the current method of manual classification is inadequate.
SurfWatch [http://www1.surfwatch.com/] is another Internet filtering product that works by maintaining blacklists. SurfWatch appears to search web pages and URLs for restricted keywords. Any page or URL containing a restricted keyword is classified as adult content. There is no further initial verification process. This can lead to a site being erroneously classified as adult, as illustrated in the recent incident in which one of the pages on the “Whitehouse for Kids” website was classified as adult content because it was named “couples.html”.
CYBERsitter is yet another Internet filtering product that attempts to classify web sites, by looking at the text of the page and URLS. The product removes profane English words from the text of web pages, but does not filter out pornographic images from web pages which do not contain text and does not filter out words that are profane words in foreign languages.
NetNanny [http://www.netnanny.net/] is still another Internet filtering product that uses a blacklist of domain names of web sites not suitable for children. The NetNanny web site discloses that the NetNanny site list is compiled by NetNanny staff, the suggestions of customers, and third party children's advocacy groups. The ability of the product to filter out undesirable sites is limited by the comprehensiveness of the blacklist, which is compiled manually.
In sum, given the rapid proliferation of Internet objects, manual classification of Internet objects is an inadequate method of classification. Similarly, the use of unweighted or unverified text filtering alone results in inadequate and often inaccurate classification of Internet objects. Given the growing availability of adult content on computer-readable media, there is also need for a method and device that can more accurately and efficiently identify adult content on computer readable media, and either filter or deny access to such adult content.
The present invention can also be used to classify Internet objects and/or objects stored on computer readable media based on other criteria besides adult content.