1. Field of the Invention
The invention relates to email fraud detection, prevention, and mitigation.
2. Background of the Related Art
Email is a widely used communication channel for companies (“domain owners”) of any size or type, and there is very limited technology in place to verify the authenticity of an email message, its sender, and its origin. It is easy to counterfeit an email whereby an illegitimate originator of an email can purport that an email message comes from a domain owner's domain but it in fact comes from a different source. The result is rampant phishing, financial scams, malware, etc. being delivered via email fraudulently appearing to originate from a trusted source. Phishing emails may contain links to websites that are infected with malware. Phishing is typically carried out by e-mail spoofing or instant messaging, which often direct a user to a fake website whose look and feel are almost identical to the legitimate one.
There are email data exchange mechanisms in place that report information about email flows to “domain owners” that are available universally. These exchange mechanism provide varying levels of insight to fraudulent email streams.
(1) One of the most widely used exchange mechanisms is the email “Feedback Loop”. Most email mailbox providers (e.g., Yahoo!, AOL, Gmail) offer a way for email mailbox owners to flag a received email as unwanted. This is usually called the “Spam” button. When an email recipient determines that a received email is unwanted they simply click the “Spam” or equivalent button. The email mailbox provider then creates a message called a “complaint” containing a small report and a copy of the unwanted message. Normally the complaint is sent to the “domain owner” of the unwanted email as defined by the “return-path” header within the email message, RFC 3834. Return Path Inc. hosts “Feedback Loops” on behalf of various mailbox providers. The data generated by the “Feedback Loops” is referred to as “Complaint” data. The specifications for this data are defined in RFC 6449 for Complaint Feedback Loop Operational Recommendations, and RFC 6650 for Creation and Use of Email Feedback Reports: An Applicability Statement for the Abuse Reporting Format (ARF).
(2) A second type of email data exchange mechanism is DMARC (Domain-based Message Authentication, Reporting and Conformance). DMARC is an open specification draft for the purpose of reducing email fraud through the enforcement of established email authentication mechanisms of DKIM (DomainKeys Identified Mail) and SPF (Sender Policy Framework). DMARC combines the authentication checks provided by DKIM and SPF with an identifier alignment check to calculate a DMARC pass or fail status for an email. The alignment identifier check is the comparison of the FROM DOMAIN (RFC5322), MAIL FROM DOMAIN and DKIM DOMAIN. Mailbox providers or Email Receivers are encouraged to use the DMARC pass or fail to determine delivery to an email recipient. Domain Owners enable DMARC checks for their mail streams by publishing a TXT record in DNS per the DMARC specification. The DMARC TXT record contains configuration parameters related to DMARC's reporting and policy enforcement capabilities. Policy will be discussed below in the Policy section.
DMARC provides two different types of data. The primary data type is an aggregated report of all email received from a specific “From domain” for a given time period, usually one 24 hour period. The aggregate report provides a summary of all email traffic broken down by SENDING IP ADDRESS, a count of emails received and the status of the respective DMARC checks, DKIM, SPF and identifier alignment. This report is referred to as a DMARC Aggregate data. The next data type is called a Forensic Report. A Forensic Report is generated by an email Receiver when an email fails the DMARC test. A Forensic report contains a brief machine-readable report and a full or redacted copy of the offending email message. The format of the Forensic report is currently defined by RFC 6591. This report type is referred to as a DMARC Forensic data. The relevant RFCs are RFC 6376-DomainKeys Identified Mail (DKIM) Signatures, RFC 4408-Sender Policy Framework (SPF) for Authorizing Use of Domains in E-Mail, Version 1, dmarc_draft-base-00-03-Domain-based Message Authentication, Reporting and Conformance (DMARC), RFC 6591-Authentication Failure Reporting Using the Abuse Reporting Format, RFC 5322-Internet Message Format.
(3) A third type of email data exchange mechanism that exists is not available to the general public. Domain Owners are usually only given access to the data, often indirectly, via business agreements with E-mail/service providers, security companies, or related businesses. This non-public email data exchange information can be used to diagnose fraudulent email and includes spam trap feed, private aggregate data, private message level data and subscriber data.
A Spam trap network is a collection of user email accounts that are designed to catch spam or fraudulent email by way of not publishing or providing the destination mailbox's email address. Any messages received at these unpublished mailboxes are typically fraudulent in nature. A Spam Trap feed includes data generated by these spam trap networks, and can include full email message samples for all mail sent into the Spam Trap network.
Mailbox providers can provide data feeds that specifically contain data for specific mail senders. There are two types of these feeds. The first is private aggregate data authentication data. This type is similar to the DMARC aggregate feed mentioned above, except that the format is non-standard and customized. The second type of feed is a private authentication failure message feed called private message level data throughout, which is also similar to the DMARC forensic type. Many of these feeds pre-date DMARC Forensic and consequently use a nonstandard authentication failure report format. Some newer private authentication failure message feeds incorporate the Authentication Failure format used by DMARC Forensic reports.
(4) The fourth type of data feed which is provided by email account plug-ins such as Other Inbox, operate email account owners with a mechanism to organize their inbox, for example by automatically categorizing the email messages they receive. As part of this mechanism, anonymous user mailbox data can be generated that includes redacted fraudulent email message samples that can be used to perform forensic analysis of the source of the spam and fraud. This includes attributes of the message such as SENDING IP ADDRESS, MAIL FROM, FROM Address, Subject line, and body content including URIS. Additionally, this feed includes meta data for each message describing email user interactions specifically, if the email reached the users inbox which is referred to as inbox placement. This data is referred to as Subscriber Data, such as discussed in U.S. Patent Publication No. 2013-0282477 and U.S. Patent Publication No. 2014-0280624, titled System And Method For Providing Actionable Recommendations To Improve Electronic Mail Inbox Placement And Engagement being filed herewith, the entire contents of which are hereby incorporated by reference.
Of the reporting mechanisms listed above, DMARC and private data feeds enable a domain owner to publish a policy to receivers of email that asserts that a domain owner properly authenticates all of its domain's email messages and if a receiver receives an email message that fails authentication verification, the policy can instruct the receiver to quarantine or reject the email message. Thus prior to publishing a policy, the domain owner needs to verify that it does in fact properly authenticate all email messages sent from a given domain. See “Proactively Block Phishing and Spoofing Attempts” Return-Path-Domain-Protect-FS-9.12—2.pdf, and dmarc_draft-base-00-03-Domain-based Message Authentication, Reporting and Conformance (DMARC).
However, all of the data provided by the listed data exchange formats are very difficult to analyze, normalize and interpret, and not every type of data exists across all mechanisms.