The Internet has vast amounts of information distributed over a multitude of computers, hence providing users with large amounts of information on various topics. This is also true for a number of other communication networks, such as intranets and extranets. Although large amounts of information may be available on a network, finding the desired information may be difficult.
Search engines have been developed to address the problem of finding desired information on a network. Typically, a user who has an idea of the type of information desired enters one or more search terms to a search engine. The search engine then provides a search result, which includes a list of network locations (e.g., uniform resource locators (URLs)) that the search engine has determined to include an electronic document relating to the user-specified search terms. Alternately, a user may browse through information located on the network, as for example, when the user is not sure what information is wanted. Some search engines provide categories of information and categories within those categories for selection by the user, who may thus focus on an area of interest from these categories.
One problem associated with providing a search result to a user is that the provided search result may point to an electronic document that presents an undesirable content to the user. For example, depending on a particular search term, the search result may direct the user to an electronic document relating to sex education, mature content, pornography, gambling, hate speech, alcohol, drugs, tobacco, bomb-making, weapons, etc. To avoid presenting an electronic document having such an undesirable content to a user, presently available search engines collect electronic documents from a communications network and classify these electronic documents according to one or more categories. When such search engines generate a search result in response to a search term submitted by a user, they check the classifications of the collected electronic documents indexed in a memory area. If the classification of a particular electronic document suggests that the electronic document provides an undesirable content, existing search engines then exclude the electronic document from the search result. That is, such search engines filter the search result based on the classifications of the collected electronic documents. These search engines then present the filtered search result to the user.
Filtering the search result based on a classification of an electronic document may not effectively and efficiently filter a search result on its own. First, offline analysis of an electronic document for categorization is computationally expensive. For example, a presently available search engine may crawl and index fifty electronic documents per second if the search engine does not analyze the electronic documents for categorization. But if the search engine analyzes electronic documents to determine classifications of the electronic documents, the rate of crawling and indexing may be reduced to twenty-five electronic documents per second. Additionally, because of the vast amount of electronic documents that are collected from a communications network, existing search engines sometimes fail to accurately categorize some of the electronic documents. Thus, such search engines may fail to effectively exclude an electronic document having an undesirable content from a search result. Consequently, currently available search engines may still inadvertently present a search result that includes an undesirable electronic document to a user.
Moreover, presentation data (e.g., a title, description, URL, etc.) regarding an electronic document and presented by a search result may also include an undesirable content such as an offensive language. Presently available search engines fail to provide a mechanism to prevent presenting a search result that include undesirable presentation data to a user.
Accordingly, a solution that more effectively provides a search result without presenting an undesirable content to a user is desired.