Stream data may be considered as a dynamic data set that infinitely increases as the time passes. Data filtering is also referred to as data filtration, and an objective of the data filtering is to identify qualified data according to a rule that is set in advance, and intercept or discard the data. Data filtering is an importance operation in stream data processing. For example, for some web sites for sharing Internet videos, videos continually submitted by users constitute a large scale of video stream data. Within an extremely short time, a system needs to complete operations such as analyzing, filtering, indexing, and storage on the video stream data, and filters out an unqualified data stream. For another example, for emails, laboratory data indicates that, in 2012, junk mails account for 72.1 percent (%) of all emails on average. Therefore, to ensure service quality, an Internet email service provider needs to filter out a junk mail from an email stream.
For a problem of how to perform data filtering, in the prior art, generally multiple pieces of detection data are preset, and when data needs to be detected, a similarity between data is determined in a manner similar to string matching, and if a similarity between the to-be-detected data and one or more pieces of detection data thereof is relatively high, it is determined that the to-be-detected data needs to be removed.
A disadvantage of the method is that a similarity between data can be determined only in a manner similar to string matching, and therefore data that has complex semantics cannot be processed.