1. Field of the Invention
The present invention is related to a system and method for filtering messages, and more particularly, to filtering messages based on user reports.
2. Description of the Related Art
Any Internet user is familiar with SPAM. SPAM is often a number one choice of unscrupulous advertisers due to its low costs. Recent studies have shown that SPAM takes up to 80% of all the mail traffic on the Internet. Generally, SPAM is a mass mailing of commercial or other ads to recipients who do not desire to receive them. Originally, SPAM was mainly related to emails. However, currently SPAM is sent over Instant Messengers, web sites, social networks, blogs, web forums, as well as, SMS and MMS messages.
Consequently, SPAM has become a serious technical and economic problem. Large volumes of SPAM increase the load on data channels and increase Internet traffic that has to be paid for by users. Also, people waste productive work time on sorting out SPAM. Furthermore, SPAM is becoming less commercial and is often used for Internet fraudulent schemes due to anonymity of the SPAM messages. Also, SPAM can be used for delivering malware.
SPAM is often used for financial schemes (such as, for example, “Nigerian letters”) that attempt to gain user credit card numbers or passwords to on-line banking systems. Phishing schemes and malware delivery are other examples of how SPAM can be used. Therefore, means for protection from SPAM are needed. A number of methods for dealing with SPAM exist. For example, a black list approach uses black lists to block messages that come from the addresses marked in the black list. While this method provides for 100% blocking of the message coming from black-listed addresses, it can result in many false-positives, because some legitimate addresses also get on the black list.
Another anti-SPAM method uses technology that detects identical (or almost identical) messages in a mass mail stream. An effective mass mail analyzer requires very large volumes of mails. Therefore, this method can only be used by very large mail providers. An obvious shortcoming of this method is that most legitimate services (for example, subscriptions to news and updates) also use mass mailings and can be mistaken for sources of SPAM.
Yet another anti-SPAM method is checking of message headers. This method blocks messages that have some typical mistakes in their headers, which indicate robot-generated SPAM messages. A shortcoming of this method is that its effectiveness decreases as the SPAM generating robots improve and make fewer mistakes in the headers.
Another anti-SPAM method is a grey list method. A rejection with a special error code is generated for each incoming message. Then, the SPAM-sending robot, unlike standard mail servers, does not attempt to send the same message again. This is used as criteria for determining legitimate messages. If a sender repeats an attempt to send a message within a certain time period, the message is allowed and the sender is placed into a white list. However, this solution is not acceptable for many users, as delivery of all of the messages is delayed.
Yet another anti-SPAM method is content filtering that uses special SPAM-filters, which analyze all parts of the incoming messages (including graphical ones). The analysis allows forming a SPAM lexical vector or to calculate SPAM weight of a message. Based on these parameters, a SPAM or no-SPAM verdict is made. Such an anti-SPAM method is disclosed in the U.S. Pat. No. 7,836,061, entitled “Method and system for classifying electronic text messages and spam messages.”
The SPAM filters are configured in anti-SPAM labs that create and perfect the filtering rules. Since the SPAM senders constantly attempt to overcome the protections created by the SPAM filters, the process of modifying and perfecting the filtering rules is also continuous. Effectiveness of the SPAM filters depends on timely updates of the filtering rules.
As discussed above, conventional anti-SPAM methods do not provide a solution that allows for blocking all SPAM messages with a 100% effectiveness. Accordingly, it is desirable to have an effective anti-SPAM solution that not only uses automated filtering rules updates, but updates the filtering rules based on statistics produced by a large number of SPAM recipients.
US Patent Publication No. 2010/0153394 discloses updating filtering rules based on user reports. Messages are checked by a SPAM filter located on a mail server and delivered to users. Each user can sent a report about SPAM messages to the server. The SPAM filtering rules are changed based on the user reports, so the next time the reported messages are detected, they are blocked. In some implementations, a database of user reputations is used for changing the filtering rules. In order to change the filtering rules based on a user report, the system determines the user reputation. The user reputation can be increased or decreased depending on accuracy and reliability of user's SPAM reports.
U.S. Pat. No. 7,373,385 discloses a method for SPAM identification based on user reports. All email users are connected to a common anti-spam system. When users receive SPAM, they can report it to the system. Then, each email is assigned a rating based on a number of reports and reliability coefficient of each reporting user. The rating is compared to a threshold value, and a SPAM verdict is produced. User reliability coefficient is calculated based on the user report statistics. If a user sends unreliable reports, his reliability coefficient is reduced and his reports are not taken into consideration when email rating is calculated.
U.S. Pat. No. 7,937,468 is intended for reducing the time needed for SPAM verdict based on user reports. A system determines if a message contains SPAM using statistic analysis of the earliest user reports and estimated common reports that are based on the earliest user reports. The verdict is made based on this estimation and user reputation.
US Patent Publication 2004/0177110 discloses a method for teaching SPAM filters using user reports. A user determines if a message contains SPAM and reports it. The SPAM filtering rules are modified based on the user reports. The invention includes cross reference checks of users and elimination of the users who do not pass the check.
The conventional systems increase the effectiveness of filtering by taking into account an opinion of each individual user based on users' reputation. The users' reputation is calculated based on statistics of reliability of user reports. While this approach is effective, it has certain shortcomings. First, users who report SPAM for the first time have low reputation regardless of their actual knowledge and expertise. Second, in order to estimate a real level of user expertise, a reliability statistics need to be collected for this user. This requires a large number of user reports, which takes a long time, especially when the user deals with the relatively “clean” messages that have passed through the initial automated filtering.
Therefore, all of the conventional systems have one major shortcoming—user differentiation that does not allow for making true judgments or accurate estimations of the actual user knowledge and expertise. Accordingly, there is a need in the art for a system and method that allow for in-depth comprehensive estimation of user knowledge that provides for more efficient message filtering.