Massive volume of various kinds of digital content on web pages is widely accessible on the Internet nowadays. Therefore, software or hardware for web content filtering has been developed in the market recently. For one thing, minors could be automatically prohibited from accessing pornographic or violent content on web pages through the Internet. For another, a company may control their employees not to engage in matters through the Internet other than work. In order to meet such demands, some hardware devices, such as NetPure 5000 from Allot, are developed. In both Taiwan and the United States, this kind of products becomes indispensable now.
Under current techniques, there are several ways to determine whether the accessed web pages contain content supposed to be forbidden:                1. Build a database of Universal Resource Indicators (URIs) of predetermined forbidden web pages to match the URI of the current web page request against the database. If matched, forbid access to the web pages. Most of current products use this method to filter web content. However, in this method, an enormous database needs to be maintained. It is often undesirable.        2. Use keyword or key-phrase matching to check whether certain keywords as the selected features exist within the content of the web page. However, there is a high possibility to mis-filtering. For example, many web pages discussing gender will be filtered out when “sex” is used as a keyword. The filtering quality is disappointing.        3. Use self-learning methods on certain sample web pages and then classify the web pages automatically. Although decision precision in this manner is better, it requires fully scanning the whole content on the web page during the classification process, making the efficiency worse.        