Today, various content filtering mechanisms are available to entities to manage and/or control user access to the Internet via facilities provided by the entities. For example, a company typically implement some form of content filtering mechanism to control the use of the company's resources (e.g., employee work hours, computers, and/or servers) to access the Internet. Access to contents within certain predetermined categories using the company's resources may not be allowed during some predetermined periods of time.
A conventional content filtering system includes a database of content ratings. A rating is a classification of a web page based on some predetermined criteria. For example, www.cnn.com may be classified in the news category, www.amazon.com may be classified in the shopping category, etc. Depending on the content filtering criteria, one may classify web pages into different number of categories. The number of categories may range from two (e.g., sports and non-sports) to a large number (e.g., 50, 100, etc.) to provide more elaborate classification.
Currently, one way to handle content ratings of web pages is to assign a rating for each individual web page on the Internet. However, given the huge number of web pages available on the Internet, this approach is very inefficient because this approach generates a large volume of content rating information. It is also costly to store and/or to deliver such a large volume of data.
Another existing approach to handle content rating of web pages is to simply assign only domain level rating information. That is, a rating assigned to the main page of a domain is also assigned to the entire domain. As a result, only domain level rating information is stored in the databases and transmitted to content filtering clients. Although this approach reduces the amount of content rating to be stored, the domain level rating information is typically inadequate for accurately rating an individual web page because many domains include a wide variety of contents in the sub-directories in the domain.