The astronomical growth of the World Wide Web in the last decade has put a wide variety of information at the fingertips of anyone with access to a computer connected to the internet. In particular, parents and teachers have found the internet to be a rich educational tool for children, allowing them to conduct research that would in the past have either been impossible or taken far too long to be feasible. In addition to valuable information, however, children also have access to offensive or inappropriate information, including violence, pornography, and hate-motivated speech. Because the World Wide Web is inherently a forum for unrestricted content from any source, censoring material that some find objectionable is an unacceptable solution. The browser determines whether or not to display a document by applying a set of user-specified criteria. For example, the browser may have access to a list of excluded sites or included sites, provided by a commercial service or a parent or educator. Users can also choose to receive documents only through a Web proxy server, which compares the requested document with an exclusion or inclusion list before sending it to the client computer. Another method, developed by the Recreational Software Advisory Council (RSAC), provides a detailed rating system; ratings are stored by the author or content provider in a specific format with a document's meta-information. Current Web browsers can extract the ratings and compare them with user-specified content levels to determine whether or not to display the document. The user can also set the browser not to display pages without a rating.
One problem with all of the currently-existing solutions is that they cannot keep up with the rate at which Web pages are being added or modified. Currently, both rating sites and adding sites to exclusion lists require human labor that is subjective and time consuming. Even if it were possible to evaluate every new site, there is still no way to apply present methods to dynamically created documents, for example, search result pages. Search engines receive a user query, search an index to find applicable documents, and create a search result page listing a number of the located documents. The search result page typically includes a title and short abstract or extract, along with the Uniform Resource Locator (URL), for each retrieved document. The search result page itself might have objectionable content in the document summary information, or it might contain hyperlinks to sites with objectionable content. One way to address this problem is for browsers not to display search result pages at all. Without search engines, though, internet research is significantly limited.
AltaVista.TM., a well-known search engine, has developed a Family Filter.TM. in cooperation with SurfWatch.TM., a company that pioneered the concept of filtering objectionable content on the Web. The filter can be applied to either audio, image, and video, or to all content, including text. When the filter is turned on, documents that were previously classified as objectionable are prevented from appearing in the search results. Objectionable content falls into one of five categories: Drugs/Alcohol/Tobacco, Gambling, Hate Speech, Sexually Explicit, and Violence. A significant drawback to the Family Filter.TM. is its lack of flexibility and user input in blocking access to sites. For example, some parents may want to protect their children from exposure to hate speech, but not from other types of offensive material. Different users also have different standards for objectionable content within each category. The Family Filter.TM. has no means for accommodating the broad variety of user requirements.
There is a need, therefore, for a method for rating automatically-generated documents that allows for user flexibility in the definition of objectionable content.