Field of the Invention
The present invention is generally directed to reassembly free scanning of files in a peer to peer network. More specifically, the present invention scans file data without reassembling a file even when parts of the file are received out of order.
Description of the Related Art
Data communicated over computer networks today pass through various layers in a computer system architecture. Typically data is received at a network interface of a computer at a link layer. The link layer is a layer in the architecture of a computer that includes physical hardware. The link layer connects the computer to other computers in a computer network. Link layers also are used to transmit data from one computer to another over a computer network.
Other layers above the link layer in computer system architectures commonly include a network layer, a transport layer, and an application layer. The network layer receives data packets from and provides data packets to the link layer. The network layer may also receive data in segments from the transport layer and send data in segments to the transport layer. Commonly when the network layer receives a segment of data from the transport layer it will generate a packet or an internet protocol (IP) datagram for transmission to another computer. This process may include encapsulating the segmented data received from the transport layer and adding a header that includes a destination IP address when generating an IP packet. In certain instances more than one IP packet may be associated with a data segment. The network layer may also receive IP packets from the link layer and may pass segmented data to the transport layer.
When a series of IP packets are used to transport data to a computer, those packets may be received out of order at the network layer. When this occurs, the transport layer may re-order the data segments from a plurality of packets before sending the re-ordered data to the application layer. Conventionally data received at an application layer must be received in-order (i.e. sequentially). For example, in a client-server environment file data received at the application layer of a client or a server must be in-order before it can be processed. This is because the client-server environment expects received data to be in order. While communication transferred over a computer network according to the Transmission Control Protocol (TCP) will re-order packets, communications over other transport layer protocols, such as the User Datagram Protocol (UDP) do not.
Typically in a client-server environment a server will send a data set or a file sequentially from the application layer to the transport layer, the transport layer may then send that data to the network layer. The network layer then packetizes the data and sends a plurality of packets to a client. Even though the packetized data may be sent out of order, data contained in the packets will be re-ordered before that data is received at the application layer at the client. Because of this, application layers at a client or a server in a client-server environment may never receive file data that is out of order.
Peer to peer (P2P) networks, however, do not operate in the same way as a client-server environment. For example P2P networks may receive data at an application layer that is out of order. This is because P2P networks fundamentally have a different type of architecture as compared to a client-server environment.
In a P2P network a computer accessing file data may receive parts of data from a file from a plurality of computers. A P2P network is capable of transmitting file data in pieces where each piece of data may be transmitted from a different computer. Because of this a first piece of data received from a first computer may be out of order as compared to a second piece of data received from a second computer. When this occurs the network layer and the transport layer at a receiving computer will not be aware that the first data piece and the second data piece have been received out of order. This is true even when packetized data sent from the first computer (or the second computer) to the receiving computer have been re-ordered. This is because the network layer and the transport layer at the receiving computer do not check whether application data received from different peer computers are received in order. Conventionally, the network layer and the transport layer are only capable of re-ordering packetized data that has been transmitted from a single source computer to a destination computer.
P2P networks may also break a file into a number of pieces where each piece may include a pre-determined or specific number of blocks. Information relating to a number of pieces that a data file is broken into may be included in metadata (or a metadata file) that is associated with the data file. Once a number of pieces are identified, a file size divided by the number of pieces will correspond to a number of blocks that the file may be broken into in the P2P network.
Limitations included in the network layers and in the transport layers of computers today mean that file data received at an application layer of a computer cannot easily scan the received data for malicious content (such as computer worms, viruses, or other attacking software). Conventionally the scanning of data for malicious content at the application layer in a P2P network either cannot be done reliably or must be done in an inefficient manner. For example, if data from a file is scanned out of order, the scan can miss a virus contained within the data, because malicious content are characterized by a sequential ordered series of characters, not an out of order series of characters. In another example, when the application layer re-orders received data before scanning it, data from the out of order pieces must be stored until the data pieces can be re-ordered and scanned. Thus, the first example is unreliable and the second example is inefficient.
Application data that includes interleaved out of order data received at an application layer of a computer system if scanned in the order received may result in missing malicious content contained within the received interleaved data. Furthermore, data received out of order may also result in scanning software falsely detecting malicious content. For example, when the character sequence of “car” is associated a virus and two pieces of data that were received out of order where a later piece of data ending with the character “c” is scanned before an earlier piece of data that begins with “ar,” malicious scanning software will falsely identify that these pieces of data include the virus, when they do not.
What is needed to increase the reliability and efficiency of P2P networks are systems and methods that scan pieces of data received out of order at an application layer without storing and re-ordering data pieces that have been received out of order. What is also needed are systems and methods that scan interleaved data reliably at an application layer. The reliable scanning of received data at an application level increases the reliability of detecting malicious content while reducing the likelihood that malicious content scanning software will falsely associate received data with malicious content.