1. Field of the Invention
Embodiments of the present invention, generally, relate to controlling internet activity. In particular, embodiments of the present invention relate to a method and apparatus for automatically classifying an unknown site to improve Internet browsing control.
2. Description of the Related Art
Internet browsing control software applications (e.g., parental control software applications) monitor Internet activity by one or more computer users. Under certain conditions (e.g., inappropriateness, security risk and/or the like), such Internet browsing control software applications may block access to a particular web site. The Internet browsing control software applications are used for various purposes, such as assessing employee productivity/trustworthiness, preventing children from viewing inappropriate web content, preventing disclosure of sensitive information and/or the like. For example, parents may use parental control software applications to monitor a child's internet activity in order to make sure the child does not visit any pornographic web site.
Although the parental control software applications monitor the child's web browsing and enable parents with some authority over which web sites the child can visit, there is a grey area between “bad” websites and “good” websites for various age groups. For example, www.cnn.com is considered a good, informative website for most age groups and www.playboy.com is clearly an inappropriate, bad website for all age groups, but an unknown web site is not clearly inappropriate or good because the unknown web site is not attributed to any preexisting reputation. Consequently, unknown web sites cannot be classified without an intense analysis of the web content. There are an increasing number of web sites that are unclassified or classified as unknown because such web sites are not on any web site category of the classification information.
The parental control software applications may use two methods in order to determine whether a web site is suitable to be viewed by children: a) analyze the web content while the child visits the web site (e.g. examine text for profanities, images for sexual images and/or the like) and b) compare the web site to classification information that includes categorizes for most web sites (e.g. www.disney.com is included in “Kids” category)). Any web site comprising inappropriate web content for a particular child's age group may be blocked by the parental control software application. Unfortunately, new web sites are unknown and thus, not listed or screened. The number of unknown web sites is large because there are thousands of new web sites created every day.
Existing parental control software applications are limited to the classification information when identifying “good” and “bad” web sites. As such, the existing parent control software applications do not provide a procedure for automatically classifying an unknown website. Often, third party organizations crawl the Internet and classify or rate “unknown” websites. Such third party organization license or sell such classification information, as well as periodic updates, to owners or venders of parental control software applications. If a web site is “unknown”, then the third party organization has not provided the parental control software applications with a classification of the website. Consequently, the existing parental control software applications are unable to effectively control a child's browsing activity with respect to the unknown web site since the third party has not yet classified the unknown web site as “good” or “bad”.
Occasionally, the parent control software applications blindly block or allow unknown web sites depending on a specification of various settings for the parental control software applications. As a result, some children may be allowed to visit inappropriate websites and some children may be blocked from viewing suitable web sites. For example, the unknown web site may be a legitimate website for news that the parental control software blocks the child from viewing. Vice versa, the unknown website may be a Phishing site, through which sensitive information might be acquired, or an adult site that the parent control software allows the child to view much to the displeasure of the parents. Either situation occurs simply because the unknown website has not yet been classified as “good” or “bad”.
Therefore, there is a need in the art for a method and apparatus for automatically classifying an unknown web site for various age groups in order to improve Internet browsing control software applications.