The developments of digital technologies have resulted in tremendous amount of data generated everywhere. The daily large volume transaction data of a bank is but one example. Data processing to obtain beneficial information has been a rewarding experience. Rapid development of computer technology has opened up a variety of possibilities for data processing, and promoted great leaps in the development of database technology. Faced with the ever increasing data volume, however, people are becoming increasingly dissatisfied with query function of a database. One basic question has emerged: Can we obtain from the data real information or knowledge that is actually useful for decision making? The conventional database technology, which is only good at data management, has become powerless in face of this basic question, and so is conventional statistical technology which faces its own great challenges. Therefore, a new method for processing massive amount of data is urgently needed.
At the same time, methods for promulgating information by users through the Internet have become more and more effective and comprehensive. Examples of methods for promulgating information include using instant messaging tools, sending various kinds of information by email, or posting information on a forum on a network. However, part of this circulated information may be undesirable to a user, or information may be illegally promulgated and need to be filtered. Existing methods for filtering user information are based on direct keyword determination. If a relevant keyword appears in user information, associated user is determined to be a target user.
However, existing technical scheme only matches information using keywords and does not penalize the information or user characteristics from other aspects, and therefore may incur a high false-alarm rate. For example, if “win a prize” is used as a keyword to future fake prize-winning advertisements, and if something like “I won a prize today” appears in a message of a user, a system may falsely conclude that what the user is sending is a fake prize-winning advertisement, and therefore filter out the message of the user to cause the user to fail to perform related normal operations such as chatting and leaving comments.