1. Field of the Invention
This disclosure relates to the field of network security; specifically, the protection of computer systems and networks from electronic attack by detecting, classifying, and rating security threats presented by a given network connection in real time.
2. Description of the Related Art
Every computer connected to the Internet is connected in some fashion to every other computer connected to the Internet, and all of these computers are capable of communicating with each other through various layers of network communications protocols. These protocols differ wildly at the physical level, with some protocols communicating through changes in voltage across copper wires, others utilizing pulses of light across fiber optic cable, and still others using radio and microwave signals broadcast through the air.
However, the key to the Internet's success is the Internet Protocol—a routing and addressing protocol layered on top of the physical protocols and ignorant of the actual physical medium used. The Internet Protocol allows any one computer to find any other computer on the Internet by knowing only one thing about the remote computer: the Internet Protocol network address associated with that remote computer. This “IP Address” is a thirty-two bit binary number, commonly represented visually in “dotted-decimal” format for improved human-readability, such as: 150.50.10.34.
Each computer on the Internet generally must have a unique IP Address. When data packets are broadcast to the Internet identifying the IP Address of the intended recipient, devices with knowledge of network topography determine where to send the packets. When the destination machine receives the data packets, it verifies that the packet is intended for it by examining the destination IP Address stored in the Internet Protocol packet header, and disregards packets that are not intended for it. Further, the datagrams must also identify the IP Address of the sending computer so that the destination device knows where to send responses. IP Addresses are so fundamental to the Internet that even novice Internet users generally are aware of them.
Additional protocols are built on top of IP to improve the reliability of network communications, such as the Transmission Control Protocol (“TCP”). TCP handles the “session” between each endpoint of network communications, breaking large chunks of data into small datagrams and sending each datagram separately, reassembling the datagrams in proper order when they are received by the destination computer, and re-transmitting lost datagrams. TCP and IP are the workhorses of the Internet and, due to their complementary functions, are often referenced in concert as “TCP/IP,” though they are technically distinct protocols.
Application protocols are then layered on top of TCP/IP to enable specific types of Internet communications. For example, the HyperText Transfer Protocol (“HTTP”) is used by web browsers to exchange web page between web sites and web browsers. Other such protocols include the Simple Mail Transfer Protocol (“SMTP”), which is used to transfer one-to-one messages between Internet users, what is now known as “e-mail.” Other venerable open protocols include FTP, IRC, IMCP, and SNMP. Newer protocols include peer-to-peer protocols and closed protocols, some of them layered on top of these and other open protocols. Network protocols are sometimes referred to as a “protocol stack” because each higher level protocol is generally independent of the protocols “beneath” it.
When these fundamental building blocks of the Internet were engineered, virtually all computers, people, and institutions with access to the Internet could be trusted to behave themselves. In its infancy, the Internet was used almost exclusively by academics at major research institutions, the government, a handful of private corporations, and a very small number of individual users with benign intentions. The engineering goal of the Internet was physical security, not data security, and the system was designed to survive disruptions caused by damage to physical components, such as by acts of warfare or terrorism. As such, little attention was paid to data security threats originating within the network, and the protocols do not capture much information that can be used to identify nefarious individuals or malicious data. For example, TCP/IP captures little identifying information beyond the source and destination IP Addresses. Similarly, the designers of protocols such as SMTP, FTP, and IRC had little reason to include any form of source verification. For example, SMTP simply assumes that the sender of an e-mail is who the sender claims to be. The designers of newer protocols, notably peer-to-peer file sharing protocols, specifically engineered them to protect anonymity and frustrate attempts to identify the user.
When the Internet was commercialized during the tech bubble of the 1990s, the open nature of Internet's basic protocols was abused to flood the Internet with unwanted traffic. For example, the scourge of the 1990s was unsolicited junk e-mail known as “spam,” which was blasted through open relays on the Internet which blindly and obediently forwarded SMTP traffic as they always had, creating a substantial industry in highly sophisticated spam-detection and spam-blocking software solutions. Because the authenticity of the sender is nearly impossible to validate, spam solutions generally examine the content of the e-mail to determine whether to categorize it as spam.
While spam is annoying, compared to modern threats to data security, spam now appears in hindsight like the quaint troubles of a bygone era of naiveté. An enormous amount of money now is exchanged across the world in on-line financial transactions, ranging from ordinary consumer purchases, to sales of securities, to interbank and intergovernmental transfers. Individuals also exchange private, personal information such as social security numbers, dates of birth, photos of their families, addresses and phone numbers, insurance information, credit card numbers, and bank information. Lawyers and doctors send their clients confidential and privileged information; corporate board members, government agencies, and military personnel exchange messages and documents regarding strategies and secret new projects. All of this activity takes place on top of the open TCP/IP protocols, protected only by additional security layered on top of these basic building blocks.
The opportunity for malefactors to interject themselves into the stream of on-line activity and create havoc is manifest, and the modern threats to data and network security are myriad and include: fraud, theft, corporate and sovereign espionage, hacking, virus distribution, smuggling, child pornography, drug sales, conspiracy, organized crime, terrorism, and other behaviors injurious to nations, firms, and individuals. The threat is exacerbated by the fact that sophisticated malefactors manipulate the open structure of the Internet to hide their activities. After all, law enforcement and cybersecurity personnel have only an IP Address at their disposal to identify the source of malicious data.
However, even that limited amount of information—an IP Address—assumes that the malefactor is carrying out an attack or fraudulent transaction from his own computer. In the modern day, sophisticated security threats also come in the form of “bots”—intelligent software planted on otherwise innocuous networked computers and commandeered by the malefactor without the knowledge of the infected computer's operator. The wrongdoer plants these bots on a remote machine using “Trojan horse” techniques—sneaking the malicious software past technological security, such as by taking advantage of unpatched security flaws in operating systems, and past human vigilance, such as disguising the program as a legitimate download or burying malicious code in a funny video. The infected computer then becomes a “zombie” under the wrongdoer's control, and the wrongdoer directs the zombie to carry out attacks or fraudulent transactions, thus removing the true source of the attack from the apparent source of the attack by another degree of separation and further frustrating attempts to identify and stop the malefactor.
In addition to obscuring the true source of the malicious behavior, bots also allow malefactors to carry out attacks not otherwise possible on the shoestring budget of a cybercriminal. For example, governments and large corporations usually have substantial bandwidth available to handle Internet traffic and use sophisticated load balancers to route incoming traffic to idle resources which can promptly service the connection. No one individual computer on commodity hardware has the horsepower to take down this kind of corporate network. However, the wrongdoer can utilize a “bot herder” program to organize millions of zombies into a “botnet” and coordinate a simultaneous distributed attack on a single system. The botnet floods the victim network with traffic that appears innocent but quickly brings the system to its knees, causing legitimate users to receive a “timeout” message stating that the web site is too busy to serve them. This type of attack is known as Distributed Denial of Service (“DDoS”) attack.
While a DDoS attack is frustrating to the business, major DDoS attacks are easy to spot once they begin and the victim corporation simply issues a press release informing the public of why the website is not available. Antivirus solutions for the infected zombie computers are usually developed quickly and enough of the bots are disabled to reduce the DDoS traffic to a manageable volume. Consequently, even a highly sophisticated DDoS attack is rarely successful for more than a few days, and often no more than a few hours, resulting in some interruption of normal business operations with only modest financial damage.
However, zombies and botnets can also be leveraged to carry out more nefarious activities carrying a higher price tag for the individual user than merely not being able to reach a favorite web site. The bot software residing on the infected computer may collect personally identifying information, such as by monitoring the keystrokes of the user and recognizing common patterns of potentially useful information such as social security numbers, phone numbers, credit card numbers, bank account numbers, addresses, dates of birth, and passwords. The zombie forwards this information to the bot herder, which redistributes the information to other bots to carry out fraudulent transactions. The zombies can work in concert to defraud a single user, but are usually more effective if each zombie acts individually by emulating an individual, specific person. Using the gathered personally identifying information, the zombies connect to commercial websites, such as banks and retailers, to withdraw or transfer money, or purchase goods or services.
The amount of damage a sophisticated botnet can inflict increases with the price performance of commodity hardware. A graphics card in a high-end gaming computer today has more processing power than an entire server farm only a decade ago and costs only a few hundred dollars. The wide variety of methods, techniques, and sources for malware attacks creates the need to develop and deploy equally flexible, adaptive, and sophisticated countermeasures. Unfortunately, the breadth and depth of these threats is such that countermeasures tend to be complex, cumbersome, expensive, and intrusive upon legitimate use, pushing unto innocent users too much of the burden of dealing with bad actors.
Further, modern countermeasures are, at best, only partially successful. Because the only identifying information typically available for any given packet of network information is the IP Address, countermeasures focus on examining the payload to determine the threat profile it presents, if any. However, because the actual data transmitted over TCP is broken into separate datagrams which may arrive out of sequence, the payload often cannot be examined and analyzed until it has been received, at which point it already presents a threat. This means that a requested transaction or connection from a client must be accepted, and the data transmitted from that client must be accepted, before the threat can be identified and countermeasures can be marshaled. By then, it may be too late.
One way around this is to maintain “blacklists” of IP Addresses known to be malicious. This technique has been used on peer-to-peer (“P2P”) networks to identify “polluters” who intentionally distribute bad data to frustrate the efficacy of P2P networks. Known polluters are identified in a blacklist, and P2P clients are programmed to check new connections against the blacklist and ignore connections from known polluters.
However, IP Addresses are no longer static. Innovations such as the Dynamic Host Configuration Protocol (“DHCP”) allow routers to autonomously assign IP Addresses to computers within that router's subnet, effectively creating self-configuring sub-networks that require little maintenance or attention. As mobile device use becomes more prolific, the one-to-one relationship between an IP Address and a particular device is being eroded. For example, when an iPhone is within range of a wireless network and joins it, the device receives a new IP Address on the subnet. When the iPhone's owner wanders away, that IP Address is recycled and assigned to another device, and when the iPhone user stops at another location, the iPhone will receive a new IP Address on another subnet. Even the WAN IP Address for a home cable Internet connection changes over time and with it, the IP Addresses for all computers on the private home network. Thus, it's not enough to blacklist a given IP Address; countermeasures must also be able to remove IP Addresses from the blacklist as they are recycled and assigned to new users presenting no threat or risk. Blacklists are also insufficient because an IP Address that presents a risk for one type of transaction may present no risk at all for another type of transaction.
Tracking the ebb and flow of IP Addresses is made even more difficult because of the size of the addressing space. As mentioned, an IP Address is a thirty-two bit binary number, meaning there are theoretically more than four billion possible IP Addresses. About three billion are assigned at any given time. The use of home routers to create private networks also hides additional computers behind a single address, meaning that a single IP Address assigned to a cable subscriber may actually represent transactions from multiple computers accessing the Internet through a shared gateway, some of which may be malicious, and others of which may not. This presents a serious processing bottleneck. Further, the four billion possible IP Addresses pertains to version four of the Internet Protocol, but in version six, the IP Address is a 128-bit number allowing for not only trillions of IP Addresses, but trillions of trillions of trillions.
Consequently, current threat mitigation systems do not focus on identifying malicious IP Addresses, but instead narrowly define potential threat factors based on the payload sent. That is, existing systems do not determine whether a particular IP Address presents a threat, but instead whether the particular payload or transaction for that IP Address is malicious. Examining a payload can sometimes provide a proxy for detecting a criminal, and if the payload cannot be delivered, the criminal activity cannot be carried out. However, the sophistication and signature of attacks changes rapidly, and firms providing malware protection services struggle to keep up with the speed and flexibility of these programs. Further exacerbating the situation, it can be difficult to anticipate the new ways in which payloads can be hidden or disguised, and existing solutions to malware are thus generally reactive, rather than proactive.