As the Internet grows, network literature develops quickly due to its features such as being easy to read and covering massive data. However, the rapid popularization of the network literature and fast growing of content thereof are accompanied with many sensitive texts such as pornography, violence, and political views that are harmful to the physical and mental health of teenagers. The sensitive texts cause difficulty for readers. Therefore, how to detect a sensitive text and isolate the detected sensitive text in time has become a critical issue in creating a good reading environment for readers.
In the existing technology, a keyword list is configured in advance. The keyword list includes multiple keywords. When a new text is detected, a keyword is selected from the keyword list first, and then the text is scanned throughout and statistics about frequency of occurrence of the keyword in the text are collected. The foregoing operations are repeated until all keywords in the keyword list are traversed, so as to obtain frequency of occurrence, in the text, of each keyword in the keyword list. Lastly, it is detected whether the text is a sensitive text according to the frequency of occurrence, in the text, of each keyword in the keyword list. If the frequency of occurrence of at least one keyword in the text is greater than a preset threshold, the text is determined as a sensitive text.
In a process of implementing the present disclosure, the inventor finds that the existing technology at least has the following problem:
When a text is detected according to a keyword in a keyword list, the text needs to be scanned throughout many times according to each keyword, which costs much time and causes low efficiency in detecting the text.