With the rapid development of microblog, some users of microblog tend to develop a microblog APP (application) for the purpose of advertisement or campaign promotion, post article(s) to entice other users to click and to post and forward the article(s) automatically, which produces a great amount of template articles in similar formats within a short time, resulting in the existence of large amounts of garbage template articles on the microblog platform. Such garbage template articles are typically repetitive, or have some of their words modified randomly according to some rule or personal information of the forwarding person, contain little amount of information but have a huge data volume. According to statistics, garbage template articles account for about 10% of all the blogs. If these garbage template articles are not identified and filtered, search engine resource will be wasted, and tremendous repetitive templates will also seriously affect user experience.
Garbage template articles of the same kind have certain common features. At present, semantics included in the articles is mainly analyzed manually, so as to determine whether a microblog article is a garbage template article.
Manual identification is low in speed and efficiency, and unable to deal with the huge data amount of the microblog platform, and it is impossible to identify every and each piece of microblog article manually.