Modern computer networks operate by sending small blocks of bytes generically called “packets”. Common Ethernet packets can range in size from 64 bytes up to several thousand bytes, but other link layer protocols may have different size restrictions. Packets are sent by computer programs and/or hardware devices connected to the network to communicate with other computer programs and/or hardware devices. A computer programs and/or hardware device that communicates with others on the network may be referred to herein as a “node”. Each node has a unique address within the network. The ISO 7 layer model defines a framework for encoding network packets using multiple protocol layers. In order for two nodes at the Data Link layer to communicate on a computer network, it is common for each node to have a unique address on a link or segment. The node may also have an address that is unique to the entire network, which may be defined at the Network Layer. Packets sent across the network usually have a source and destination address at one or more protocol layers. The source address is the address of the node that is originating the communication and the destination address is the address of the node that the packet is to be sent to. It is common to refer to a pair of ‘source address, destination address’ may be referred to as a “connection” or “flow”. A flow may also include additional information such as a ‘protocol’ identifier and/or port numbers.
A typical data packet may include one or more headers associated with one or more protocols of the data link Layer, or Layer 2 of the OSI model, followed by one or more headers associated with one or more protocols of the Network Layer, or Layer 3 of the OSI model, and one or more headers associated with one or more protocols of the Transport Layer, or Layer 4 of the OSI model. By way of example, an Ethernet packet may include an Ethernet, or MAC header, followed by an IP header, followed by TCP or UDP header. However, a plurality of other network protocols, including proprietary protocols of various network equipment vendors, can also be included in the protocol stack of a particular packet, adding their own headers. There are numerous protocols that have been developed, and many of them are quite complex. Therefore the network troubleshooting industry develops, markets, and sells devices that can help network engineers troubleshoot network and/or protocol functions and errors. One approach to this is to store all the network packets at a specific node in the network, and then to later data mine the store of packets to provide a user with a specifically requested flow. Then the user can examine a specific flow in detail using packet analysis tools without having to deal with massive amounts of extraneous and irrelevant data.
Currently, the state of the art in data mining typically requires that, after all the packets at the node have been stored, and after the user requests a specific flow, that all of the stored packets must be searched for the packets that belong to the specific flow. This is because the current state of the art simply stores all packets in the order that the packets were originally received at the node. We found it to be inefficient for two reasons. First, there is no a priori way for the user to verify the existence of a specific flow that may be of interest to the user. Instead, the information store must be searched to determine the existence of the flow, which can take hours. Secondly, assuming that the flow does exist in the store of packets, each packet in the store must be individually evaluated to determine if they belong to the flow of interest. This also can take many hours.
It is therefore an object of the present invention to provide a method and apparatus for storing data packets in a network storage that overcome at least some of the disadvantages and limitations of the prior art and enable efficient flow-based and/or address-based packet retrieval.