With the development of internet, the volume of information transferred over the internet has been ever-increasing. The openness of internet also allows lots of harmful information to flood the internet. Thus there is a general need to monitor and filter information on the internet.
The application of content filtering techniques can realize filtering of harmful information on the internet, thereby providing safe network environment. There are multiple representation forms of information available on internet. Text is one of the most common representation forms of information. Text filtering refers to a process of finding a specific text from large volumes of textual information. Currently the common text filtering methods are generally based on basic keyword matching technology such that a system, according to pre-set multiple keywords relating to harmful information, searches the input text. If there are contents matching the keywords in the input text, such contents or the entire input text will be filtered or replaced.
Such text filtering methods can only filter texts that completely match the keywords but cannot determine a position or attitude of the author as reflected in the text. For example, an e-commerce website may define “detectaphone” as a filtering keyword. The current text filtering method, in this example however, would likely regard a valid text such as “prohibition to sell detectaphone” as harmful information to filter. Thus the current text filtering methods based on basic keywords matching technologies have low identification accuracy rate and cannot meet practical application requirements for information filtering.