The Internet is a well-known computing network with a worldwide scope. Originally, the Internet was used primarily for research and education. In recent years, the Internet has seen rapid growth. As of 2007, most households in the United States and many users worldwide have access to the Internet at home. Additionally, the Internet has evolved to facilitate a broad range of uses, including commerce, access to government services, personal communication and entertainment.
The Internet allows a wide variety of computing systems and other technological devices to communicate with each other. A computing system or other technical device connected to the Internet is known as a host. The Internet connects numerous distinct networks according to a standardized communication protocol known as the Internet Protocol. While the individual networks themselves are heterogeneous, employing a common protocol allows them to interconnect. In this manner, computing systems and other technological devices can communicate with each other even if they are located on different networks. Currently, a version of the Internet Protocol known as “IPv4” is predominantly used on the Internet. However, a newer version known as “IPv6” is becoming increasingly widely used.
A host is identified by an Internet Protocol address, commonly known as an IP address. In IPv4, an IP address is simply a 32-bit number. An IP address is customarily divided into four eight-bit segments, with the segments arranged so that the most significant segment is to the left. Each segment is expressed in base 10, and the four segments are separated by periods. For example, “129.42.58.212” is an IP address expressed in the customary format. In IPv6, an IP address is a 128-bit number.
Typically, a contiguous range of IP addresses is assigned to an organization. Generally, this is achieved by assigning to the organization all IP addresses which begin with a specific prefix. The prefix defines both the number of bits which must match and the values which they must contain in order to match. This method beneficially improves routing efficiency because a message to a host whose IP address begins with a specific value can be routed to a specific organization. The organization is then responsible for routing the message to the appropriate host. Thus, the message sender does not require any knowledge of the internal workings of the organization's network. For example, the set of IP addresses whose first 8 bits are decimal 9 is associated with International Business Machines Corporation. Therefore, any IP address whose first 8 bits are decimal 9 is routed to International Business Machines Corporation. In most cases, a human user may access a device by directly entering an IP address. However, IP addresses are clearly difficult for a human being to memorize. Therefore, hosts may also be accessed by a domain name. Domain names are human-readable names which consist of one or more elements separated by periods. Each element is a string which may generally contain only alphanumeric characters, digits and hyphens. An example of a domain name is “ibm.com”, which is associated with International Business Machines Corporation.
The domain name system is hierarchically structured. The rightmost element of a domain name, known as a top-level domain (TLD), generally denotes either a type of organization or a country. For example, commercial entities typically end in “.com”, educational institutions in “.edu”, not-for-profit organizations in “.org” and entities in Japan in “.jp”. Virtually any domain may include a subdomain. A subdomain expresses a more specific level of structure than is expressed by its parent domain. The domain name of a subdomain is the domain name of the parent domain with one additional element added to the left. For example, “ibm.com” is a subdomain of the “com” TLD. Most organizations have a domain name which is either a subdomain of a TLD or a subdomain of a subdomain of a TLD. Within a domain name representing a specific organization, additional levels of subdomains may indicate specific systems belonging to the organization or specific subdivisions of the organization. For example, “www.ibm.com”, a domain name which is a subdomain of “ibm.com”, is the server of the main website of International Business Machines Corporation. Additionally, “ch.ibm.com”, a domain name which is another subdomain of “ibm.com”, represents IBM Schweiz, the Swiss division of International Business Machines Corporation.
To access an Internet host according to its domain name, the domain name is first converted to an IP address. This is generally achieved by querying one or more Domain Name Servers, special servers configured to convert domain names to IP addresses. The host is then accessed according to its IP address. For example, to access a service at “www.ibm.com”, the domain name “www.ibm.com” would first be translated to the IP address “129.42.58.212”.
Multiple domain names may be associated with the same IP address. This fact beneficially allows a method known in the art as virtual hosting. In virtual hosting, a single server hosts a plurality of services. Each service typically executes under a separate domain name. The server may be identified by a single IP address, in which case all domain names hosted at the server are associated with the same IP address. Virtual hosting is advantageous for small businesses which do not require a dedicated server because it costs less than dedicated hosting. Nonetheless, virtual hosting means that a specific IP address is not necessarily associated with a unique domain name. Other Internet hosts must therefore account for this possibility.
The Internet offers a variety of types of services. One particularly well-known service is the World Wide Web, commonly known as the “web”. The World Wide Web is a method of accessing information. The World Wide Web is subdivided into individual websites, each of which has one or more pages. Each page is accessed via a Uniform Resource Locator (URL). A URL generally specifies a domain name of a host at which a website is located and optionally specifies path information for a specific page within that website. For example, the About Us page of the International Business Machines Corporation's primary website is accessed via the URL “http://www.ibm.com/ibm/us/”. “www.ibm.com” identifies a specific host, and “/ibm/us/” identifies a specific page located at that host.
World Wide Web pages are commonly coded in Hypertext Markup Language (HTML). HTML includes a feature known as forms which allows data to be transmitted to a website. This functionality allows completing transactions, such as purchasing items from an online store, via the World Wide Web. Additionally, technologies have been developed to allow a World Wide Web page to cause program code to execute at a client. Examples of such technologies include JavaScript™, Java™ and ActiveX®. Java and JavaScript are trademarks of Sun Microsystems, Inc., Santa Clara, Calif., United States in the United States or other countries. ActiveX is a registered trademark of Microsoft Corporation, Redmond, Wash. United States in the United States and other countries.
Another well-known service is electronic mail, commonly known as e-mail. E-mail allows sending a message to another user which may be received at the recipient's convenience. E-mail messages are generally textual in nature, although most e-mail systems allow attaching other types of data to an e-mail message. The sender and recipients of e-mail are identified by e-mail addresses. An e-mail address is generally expressed in the form <user_name>@<domain_name>, where <domain_name> is a domain name and <user_name> identifies a specific user at that domain name. It is noted that the domain name of the sender's e-mail address is not inherently required to match the domain name of the host actually used to send the message. This is advantageous because a user may not actually be located at the organization they are associated with. For example, an employee of International Business Machines Corporation should ideally be able to send messages from an “@ibm.com” address even if he or she is waiting for a flight at Newark Liberty International Airport. Nonetheless, this fact increases the risk that the identity of the sender of an e-mail address can be forged.
While the Internet has provided many benefits to individuals, businesses and society, it unfortunately also provides drawbacks and security risks. One common type of security risk is the computer virus. A computer virus is a computer program which generally has harmful effects and which attempts to transmit itself to other computer systems. For example, a computer virus may attempt to infect as many other hosts as possible and to deliberately waste processor time on systems it has already infected. Another type of security risk is malware. Malware is a broad term for computer software which is deliberately configured to have harmful effects. For example, a computer program may purport to be a computer game but may actually destroy data once executed. While not a security risk per se, spam is one of the most common nuisances on the Internet. Spam is a broad term for e-mail messages of a commercial nature which are sent to a large number of users who had not requested the messages. Spam is disadvantageous because it consumes a significant quantity of computing resources. Spam must also frequently be manually deleted, which costs people a significant amount of time.
Phishing is an Internet security threat which is particularly dangerous and which is increasing in prevalence. Phishing is any of a class of attacks which attempt to deceive Internet users into revealing sensitive information such as login names, passwords and credit card numbers to an attacker. One common form of phishing attacks attempts to steal a user's online identity by sending fraudulent or “spoofed” e-mail messages which purport to originate from a valid company. The e-mail messages contain hyperlinks to a counterfeit website which is designed to resemble the website of a legitimate business. In many cases, the forgery looks identical or nearly identical to the legitimate website. Thus, it is difficult for most users to identify the website as counterfeit. The counterfeit website prompts the user to enter sensitive information. Because many users failed to identify the website as counterfeit, they are likely to enter the requested information. After receiving the sensitive information, the counterfeit website may then redirect the user to the actual website of the legitimate business to conceal the attack. The attacker may then fraudulently use the sensitive information to commit offenses such as wire fraud, credit card fraud and identity theft. It is noted that even if a low percentage of intended victims actually provide the sensitive information, the attacker can obtain a large quantity of sensitive information. For example, suppose an attacker sends 10,000 e-mail messages advising that a credit card number must be supplied within 48 hours for identification purposes or the recipient's bank account will be closed. Even if only 5% of recipients fall for the scam, the attacker still obtains 500 credit card numbers.
Phishing is a widespread and increasing problem on the Internet. According to “Phishing Activity Trends: Report for the Month of February, 2007” by the Anti-Phishing Working Group, there were 23,610 unique phishing reports involving 16,463 unique phishing sites in February 2007. <Apr. 27, 2007: http://www.antiphishing.org/reports/apwg_report_february—2007.pdf> By contrast, only 17,163 unique phishing reports involving 9,103 unique phishing sites were reported in February 2006. In February 2007, 25.4% of observed phishing attempts contained some form of target name in their URL, and 17% contained only an IP address with no hostname. Id. This data implies that even users who check the properties of the URL can be deceived by phishing attacks. Furthermore, many users are unaware of how to check the properties of the URL or are unaware that it is desirable to do so. Such users are particularly at risk of being victimized by phishers.
Historically, phishing attacks have been predominantly directed at major corporations and financial institutions. In response, these companies have developed complex countermeasures. One notable countermeasure is secondary verification. In secondary verification, a mechanism in addition to the industry-standard username and password is required in order to authenticate. For example, legitimate users may be given a technological device which generates a numeric token which changes at regular intervals. The user must supply the correct token in order to successfully authenticate. Furthermore, the token will be valid only for a very short period of time. While this approach increases the security of authentication in general, it is limited in its ability to combat phishing. This is because a phishing website can simply submit the token along with the username and password to the legitimate website to confirm its accuracy.
Another countermeasure commonly used in the art is user education. Many businesses are counteracting phishing by educating their customers never to click a hyperlink but to instead to always directly enter the address of the target website. Unfortunately, most Internet users' first reaction to an e-mail containing a hyperlink is to click on the hyperlink. Upon doing so, the user is unknowingly redirected to the counterfeit website. While this advice is useful to many users, it encourages users to take an overt action which they are not used to taking. As a result, many users are tricked into accessing the counterfeit website and providing sensitive information despite this advice.
Customers are also educated not to believe “urgent” e-mail messages which prompt for security information. Unfortunately, such messages frequently provoke an emotional response in users which causes them to panic and to follow the directions in the message despite having been educated not to do so.
Customers are also educated to frequently examine their financial accounts to check for discrepancies. Unfortunately, discrepancies resulting from financial crime or identity theft are only evident once the act has already occurred. Therefore, this advice is clearly of limited use in proactively preventing phishing attacks.
Although these countermeasures have significant limitations, they nonetheless are somewhat effective in counteracting phishing. As a result, in response to these countermeasures, phishing attacks have been increasingly targeting smaller businesses. This is problematic because smaller businesses may not have the resources to effectively counteract phishing attacks. Furthermore, phishing attempts have generally been increasing in sophistication over time, making them inherently more difficult to counteract.
More generally, these countermeasures must be implemented independently by each organization. This drawback is exacerbated by the fact that these countermeasures are expensive to implement. Clearly, it is ineffective for each organization to independently implement expensive phishing countermeasures.