This disclosure relates generally to methods and systems for detecting phishing attempts in incoming electronic messages, and more particularly to a system and method for utilizing normalization and comparison techniques to identify a possible phishing attempt and to notify the electronic message recipient.
Phishing is a form of Internet scam that usually involves the amss sending of email messages which appear to be from a legitimate organization such as a bank or other financial institution or organization. These email messages often direct the recipient to a fraudulent website or form where the recipient is tricked into divulging personal or financial information. An alternative phishing scam may not ask for such information but, upon entering the URL (Internet address assigned to a resource or document by which it can be accessed by a web browser), executes a download of a keystroke-logging program that lets the phisher harvest information from the recipient's machine. The information can then be used for identity theft and fraud.
In forging an address, a commonly used technique is to include in the address some homographic characters (characters which are graphically very similar to one another). For example, if the address appears clearly in the content, an address that is visually as similar as possible to the original address could be used (e.g., instead of ZZZ@amazone.com, ZZZ@amaz0ne.com may be used, in which the character ‘o’ has been replaced by a zero). Alternatively, if it is a web site URL, it may be hidden in a graphical web page where the URL written on the screen is not associated with the real URL that will be used (e.g., a URL address embedded in an image saying “follow the Change Profile link to modify your Amazone profile, which is actually associated with the URL http://www.amaz0ne.com/profiles.asp which uses the same character replacement as the previous example).
Another approach is utilization of a technique which consists of replacing standard ASCII characters by the Unicode of some other visually similar characters used by non-roman character based languages. For example, a recent case concerned the Paypal website. In this case the forged URL was http://www.p&#1072:ypal.com/, which appeared on the address bar of the Internet browser as a valid and official Paypal URL (http://www.paypal.com). However, the first “a” was not the standard English form but rather a character from another character set (e.g., Cyrillic). This forgery appeared visually identical to the official address (as displayed in the address field of the browser due to its Unicode interpretation capabilities), but which actually corresponded to a different URL.
Phishing attacks can be costly and can drain a company's resources since a large number of attacks can be run against target companies in large volumes and billions of phishing messages pass through filtering systems that can slow down email delivery, consume valuable server processing time, and can result in the loss of important financial data. Several solutions are known that attempt to address this problem, generally employing sender or recipient authentication. For example, use of a directory of legitimate, transmitting computer addresses, verification of digital signatures, use of a personalized image to authenticate the identity of a transmitter, and identification cards. However, these approaches do not provide the capability for automatically detecting phishing attempts and identifying such attempts to the email recipient.
All U.S. patents and published U.S. patent applications cited herein are fully incorporated by reference. The following patents or publications are noted:
U.S. Patent Application Publication No. 2005/0144451 to Voice et al. (“Method and Apparatus for Providing Electronic Message Authentication”) describes electronic message authentication employing an article, such as a card or sticker that includes sender authentication information and location information such as row and column headings. For example, a recipient is issued an article that embodies unique sender authentication information that is identifiable by corresponding location information. When the sender of an electronic message wants to send a message to the recipient, the sender transmits the electronic message and both location information and corresponding desired sender authentication information located at the coordinate identified by the location coordinate information. If the sender authentication information matches the authentication information found on the article, the sender of the message is trusted.
U.S. Patent Application Publication No. 2006/0015722 to Rowan et al. (“Security Systems and Services to Provide Identity and Uniform Resource Identifier Verification”) describes a service that allows a user to perform a search on any one or multiple Uniform Resource Identifiers (URL) and/or other protocol addresses accessible via a public or private network to establish a report in a summary or detailed format on the trustworthiness of an address.
U.S. Patent Application Publication No. 2006/0031319 to Nelson et al. (“Hierarchically Verifying the Identity of the Sender of an E-mail Message”) teaches verification of the identity of the sender of an email message by performing a number of tests on DNS information. The DNS information is based on a client IP address or a sender address. Each test performed has a corresponding intrinsic confidence value representing the degree of confidence the test provides of the sender identity relationship. If multiple tests are successful, the test result with the highest confidence value is used.
U.S. Patent Application Publication No. 2006/0075028 to Zager et al. (“User Interface and Anti-Phishing Functions for an Anti-Spam Micropayments System”) teaches a protocol for protected email transmission using micropayments and a segregated inbox in which protected emails are displayed. The protocol also involves authentication of the sender to defeat phishers and an opt out protocol which can be used to block protected emails from sources from which the user no longer wishes to receive emails even if the source has made a micropayment. A white list is maintained on the protected email server (along with the opt out black list) so that recipients can designate specific senders who may send email to that recipient without paying a micropayment and still have the protected email displayed in the segregated inbox.
U.S. Patent Application Publication No. 2006/0080735 to Brinson et al. (“Methods and Systems for Phishing Detection and Notification”) teaches a machine-implemented method for detecting phishing over a computer network. A web page can be accessed and information associated with the web page can be processed. One or more conditions can be set in response to the processing and the conditions can be compared to a set of conditions indicative of a phishing attack. A user can then be informed of a potential phishing attack corresponding to the conditions through the display of an alert window or an icon. The actions can also be performed in response to a user's selection of a link appearing in an email message.
The disclosed embodiments provide examples of improved solutions to the problems noted in the above Background discussion and the art cited therein. There is shown in these examples an improved method for detection of phishing attempts in received electronic mail messages. The method includes retrieving the source code, displayed text, and a list of all specified addresses contained within the source code for an electronic message. Visual character normalization is applied to each specified address to develop all possible address combinations and to form a normalized address list. The specified addresses are removed from the normalized address list to create a revised address list, upon which comparison tests are performed to determine if each address in the revised address list is from a valid source. The recipient of the electronic message is informed of any message found to be from an invalid source.
In an alternate embodiment there is disclosed a system for detection of phishing attempts in received electronic mail messages in a networked environment, which includes a plurality of personal computers, and electronic mail server. When an electronic mail message is received the source code, displayed text and the addresses specified in the source code are retrieved. Visual character normalization is applied to each of the specified address to develop all possible address combinations and to form a normalized address list combined with the specified addresses. A revised address list is created through removal of the specified addresses from the normalized address list and comparison test are performed on each address in the revised address list to determine whether an address is valid. The recipient of the electronic message is informed if an electronic message is not from a valid source.
In yet another embodiment there is disclosed a non-transitory computer-readable storage medium having computer readable program code embodied in the medium which, when the program code is executed by a computer, causes the computer to perform method steps for detection of phishing attempts in received electronic mail messages. The method includes retrieving the source code, displayed text, and a list of all specified addresses contained within the source code for an electronic message. Visual character normalization is applied to each specified address to develop all possible address combinations and to form a normalized address list. The specified addresses are removed from the normalized address list to create a revised address list, upon which comparison tests are performed to determine if each address in the revised address list is from a valid source. The recipient of the electronic message is informed of any message found to be from an invalid source.