1. Field of the Invention
This invention relates to the field of visualization tools. More specifically, the invention comprises a method for visually depicting events in a way that allows one or more users operating in conjunction with software agents to quickly discern significant patterns from a much larger amount of information, and for these users to interact with software agents and the visual depiction in order to further refine the visual depiction.
2. Description of the Related Art
The present invention is useful in visually depicting a large number of events in a way that allows a user or users to intuitively focus on a subset of those events which is of interest. The invention has many applications—including the fields of Internet communication, telecommunications, physical traffic flow (such as motor vehicle congestion patterns or package deliveries), financial transactions, tracking the spread of human pathogens, disaster management, and other complex phenomena. A specific embodiment of the invention is a tool for depicting Internet traffic. As many of the examples described in the following disclosure pertain to this particular embodiment, some background information concerning Internet data traffic will aid the reader's understanding.
The monitoring and analysis of Internet traffic is an area of increasing interest. This field is useful for purposes of anti-terrorism, anti-crime, and counterintelligence activities—among others. Communication across the Internet must originate at a specific network location and it must terminate at a specific network location or locations. Network addresses are currently set according to Internet Protocol Version 4 (“IPv4”), which is a standard promulgated by the Internet Engineering Task Force (“IETF”). The reader should be aware that IETF is currently developing replacement standard(s) for IPv4. However, as IPv4 is currently the standard it will be used throughout this document. The methods disclosed are equally applicable to successor standards (such as IPv6) and the use of IPv4 in the explanations given should not be viewed as limiting.
IPv4 specifies a standard format for an Internet address. A dot-decimal notation is used having the format “n.n.n.n” where each “n” represents a number between 0 and 255. The following are exemplary network addresses using this notation scheme:                192.0.120.24        255.124.124.6        12.122.132.204        
Each of these numerical sequences defines a unique network location within the Internet. Each individual number “n” is often referred to as an “octet” since it encompasses eight bits of the 32-bit address. The IPv4 format provides a total of approximately 4.2 billion possible individual addresses.
Data being exchanged over the Internet originates at an IPv4 address and terminates at an IPv4 address. Thus—using the exemplary addresses from above—a stream of data could originate at 192.0.120.24 and terminate at 255.124.124.6. The authenticity of the source address is not currently verified in most data exchange protocols. This can be an issue—as will be explained in the description of the present invention.
Systems exist for monitoring traffic on the Internet. These systems provide information such as the total communication event flow rate, communication event as a function of specific Web ports, and so forth. A common protocol for monitoring traffic is the “NetFlow” protocol developed by Cisco Systems, Inc. of California (CISCO® is a trademark commonly used by Cisco Systems, Inc.). NetFlow has become a de facto industry standard that is supported by platforms other than Cisco's IOS and NXOS. It is anticipated that the NetFlow protocol will shortly be superseded by the Internet Flow Information eXport (“IPFIX”) protocol. However, the principles to be disclosed herein are equally applicable to any successor protocol and NetFlow serves as an appropriate example.
The NetFlow traffic monitoring protocol typically provides the following properties for each exchange of information between two computers (sometimes referred to as a “communication event” or “message”) occurring on the Internet:
(1) Date and time;
(2) Duration of the information exchange;
(3) Source Internet Protocol (“IP”) address (such as 192.0.120.24);
(4) Destination IP address (such as 255.124.124.6);
(5) IP protocol (such as UDP, TCP, BGP, and ICMP);
(6) Source port for UDP or TCP protocols (a “0” is used for other protocols);
(7) Destination port for UDP or TCP, type and code for ICMP (a “0” is again used for other protocols);
(8) Number of bytes of data transferred;
(9) Number of packets the data was divided into for transfer; and
(10) IP type of service.
Although the terms used in this list of NetFlow properties are familiar to those skilled in the art, some explanation may be helpful to the reader. The term “source port” refers to the port used by the originating computer (more explanation on the meaning of the term “port” is given subsequently). The “destination port” is the port used by the destination computer. The “IP protocol” (an internally redundant phrase) refers to the type of protocol used in the communication event (such as UDP or TCP). The “IP type of service” can refer to different things but has traditionally referred to a request by the sender as to how the data packet should be handled (such as a preference for speed over reliability).
The two most common Internet data transfer protocols are UDP and TCP. “TCP” stands for Transmission Control Protocol. “UDP” stands for User Datagram Protocol. TCP establishes a source-to-destination connection that remains intact throughout the data transfer. In contrast, UDP sends messages without establishing a source to destination connection.
Under either protocol, the destination computer receives information using a “port.” Each IP address has many ports. Under IPv4, each IP address has 65,535 ports. Data is sent to a specific IP address and a specific port on that IP address.
Ports are roughly analogous to channels on a radio or television. They are significant in the present context because certain ports are associated by convention with certain applications. Some examples may be helpful. As those skilled in the art will know, a Web server is a computer running an application which allows other computers to connect to it and retrieve information (typically though not always Web pages) stored on the Web server. In order for the Web server to accept remote connections, it must bind the particular Web server application to a local port. The server will then use this local port to “listen” for and accept connections from remote computers.
By convention, Web servers typically bind Web applications to TCP Port 80. This port is the default setting under the hypertext transfer protocol (“http”). Thus, the Web server will typically “listen” on TCP Port 80 since that is the port used by external computers seeking to access Web pages.
The process is different from the perspective of the remote computer seeking to access the Web server. Access is usually made via an application (such as a Web browser or a Web app) running on the remote computer. The Web browser picks a random TCP port from a defined range of port numbers and attempts to connect to TCP Port 80 on the IP address of the Web server. The Web browser will then send a request for a particular Web page.
Another example is a File Transfer Protocol (“FTP”) server, which is a server configured to transfer and receive files from remote computers (Note that a single computer could simultaneously act as a Web server and an FTP server). By convention, FTP servers use TCP Port 20 and Port 21. Thus, when the FTP application starts it will be bound to Port 20 or 21. It will not interfere with the Web server application bound to Port 80 (other than by diminishing the available data transmission capacity).
The specific port assignments are generally set by the IANA Registry (a registry managed by the Internet Assigned Numbers Authority). Software developers register the ports their applications use with IANA. This convention greatly reduces the chance of a port conflict.
In the present context, the IANA Registry allows network communication events to be categorized in useful ways. For example, if one wishes to observe “request events” directed to Web server applications, one would naturally want to look at messages bound to TCP Port 80. This type of information is readily available in the NetFlow protocol.
All Internet traffic is directed to its destination through a series of special-purpose computers called routers, such as those manufactured by Cisco Systems, Inc. The owner of a router can configure it to produce Netflow records about the network traffic that flows through that router. Each NetFlow record contains properties that describe a single communication event. These Netflow records can be transmitted to other computers to provide a live view of the traffic currently being handled by that router. Collections of NetFlow or similar data aggregated from multiple routers worldwide are publicly available from sources such as CAIDA (Cooperative Association for Internet Data Analysis). By analyzing the NetFlow data, a picture of traffic flow and volume in a network can be obtained. It is in theory possible to obtain an overall picture for the entire Internet. However, the volume of data existing at any point in time can be overwhelming. Conventional techniques for displaying such data make it very difficult for a user to obtain the “big picture.”
One existing visual depiction that has been applied to network flow events is a “parallel coordinates graph.” This type of visualization consists of a two-dimensional plot of events—often flowing from left to right. A parallel coordinates graph may be presented for data flow through a network host. External senders are plotted vertically on the left side of the graph, internal hosts are plotted vertically in the center, and external receivers are plotted vertically on the right. When data is sent, a line is plotted between the sender, the host, and the receiver. A parallel coordinates graph shows many such lines as data is sent.
Such a plot is easy to use in a low-volume small network situation. They are impractical for large networks, however, and certainly impractical for a depiction of the Internet as a whole. Even with large or multiple screens, clutter from overlapping connection lines in larger networks often becomes unreadable.
On the other hand, one of the defining characteristics of the human mind is its ability to intuitively discern patterns and changes in patterns—even for very complex events. This capability exists despite the inability to rationally define the steps in a pattern or process. The present invention seeks to take advantage of this innate human capability by graphically depicting events (such as communication events on the Internet) in a way that makes pattern spotting and evaluation possible. Software agents are used to aggregate, correlate, and analyze data and patterns of data in ways that emphasize events that may be of interest. The data are then visually presented to a human operator who is given tools to alter both the depiction itself and the activities of the software agents in order to focus on areas of particular interest.