1. Field of the Invention
The present invention relates to the field of data processing. More specifically, the present invention relates to services for identifying an attributed category for a data object, for use in applications such as rating and filtering services.
2. Background Information
The World Wide Web (WWW) is an expanding collection of diverse textual and non-textual materials, which are available for access from any location, at any time, by any person. Because of differences in individual beliefs and standards, it is not unusual for some users to find some of the contents objectionable, and want to be shielded from such contents. For example, parents often wish to shield their children from exposure to sexually explicit materials, hate speech and drug information. Similarly, companies may wish to prevent access by their employees to sites that provide or support gambling.
Notwithstanding the significant civil liberty implications associated with these concerns, a number of groups and companies have brought forward systems and techniques for assisting WWW users in blocking access to undesirable contents. For examples, SafeSurf offered by SafeSurf of Newbury, Calif., and NetNanny offered by Net Nanny Software International, of Toronto, Ontario, Canada are two example products/services available in the market for providing such blockings. Both products/services operate in accordance with a predetermined list of “undesirable” sites, pre-determined and supplied by the authors of the products/services. Access to any page denoted by a URL associated with a listed site is blocked.
Another example of such a system is the system disclosed in “Selective downloading of the types contained in hypertext documents transmitted in a computer controlled network”, U.S. Pat. No. 6,098,102 issued to Neilsen et al. The Neilsen's system utilizes the file extensions of the URL to determine whether the downloading of a particular file will or will not be allowed.
Still another method for controlling access to sites is typified by the work of the Internet Content Rating Association (ICRA), which uses the technology of the Platform for Internet Content Selection (PICS) specification to allow voluntary or in the future potentially mandatory, rating of page content by the content author. Filtering can then be done, utilizing these rating tags. The method may also be augmented with a complete block of the un-rated pages.
A number of specific and general problems with these approaches have been noted. Most importantly, the WWW is constantly growing and changing. As a result, the site contents may change from time to time, and even from one access to the next. Many web sites generate user-specific pages at every access, so the basic URL often is an inadequate indicator of the content of the page. Further, content providers are often not the best or even the appropriate, agent for content rating. Duplicitous content providers may deliberately misrate their contents.
Filtering systems which rely on downloading the page to the user's machine, then processing the page through some rating or filtering software, can be slow due to the limited bandwidth of the user connection to the Internet. Filtering systems which rely upon access to a list of objectionable sites may be slow due to access delays. The list of objectionable sites may also go out of date, due to the dynamic nature of the WWW.
Thus, what is desired is a system that responds quickly to user requests, but also tracks the dynamic nature of the WWW's contents.