This section provides background information related to the present disclosure which is not necessarily prior art.
With the rapid development in Internet, it has become an important task to identify the type of a “WWW” webpage. At present, there are mainly two types of methods for identifying webpage type. One method classifies webpages manually by using professional knowledge of the person performing the classification. This method is efficient in identifying webpages within a known field, and is highly accurate and fast in speed. But the method has limited extendibility due to the restriction of manpower. A shortage of manpower will make it impossible to process huge amount of webpages from various fields. The other method is based on text classification, e.g., Simple Bias, Support Vector Machine (SVM) and so on. This method is based on statistics and samples, requires less manual intervention, and provides a certain degree of accuracy and good coverage of various fields. But the method requires large amounts of computations, is time-consuming, thus cannot meet requirements of real time webpage type identification. Therefore, the above two methods have respective deficiencies and cannot meet the requirements for real time webpage type identification.