1. Field of the Invention
The present invention relates to the transmission of network data. More specifically, the present invention relates to a method and device of identifying the payload of a data packet in a TCP stream.
2. Description of the Related Art
P2P (Peer-to-Peer) traffic is growing dramatically in recent years. According to a study report by CacheLogic in 2006, up to 70% of ISP (Internet Service Provider) traffic was P2P traffic.
In brief, P2P is technology for exchanging data or services directly between different computer users without a relay device, which allows an Internet user to utilize files of the other party. Each person may be directly connected to a computer of another user for a file exchange without being connected to a server for browsing and downloading. In a P2P operating mode, each client terminal acts as both a client and a server. This leads to a “flat” network model.
A P2P computer network uses diverse connectivity between participants in a network, and it leverages the cumulative bandwidth of network participants rather than conventional centralized resources, where a relatively low number of servers provide the core content to a service or application. P2P networks are typically used for connecting nodes via ad hoc mode. Such networks are quite useful for many applications. Common examples of such applications include sharing files containing audio, video, data or any content in digital format, and transferring real-time data, such as telephony media. In addition, P2P demonstrates its uses in deep search, distributed computing, cooperative work, and other aspects.
While P2P users are enjoying convenient services, ISPs are suffering from the pains with P2P. For ISPs, P2P technology has the following major problems.
First, the extensive use of P2P in terms of requirement of the bandwidth to network operator, such as high backbone transit tolls, and network congestion caused by a huge P2P traffic load during peak hours. In order to ensure transmission quality, most P2P tools will create a large amount of connections, whereas no data is transmitted over these connections. Hence, this consumes precious network resources to no purpose. Furthermore, because P2P traffic exerts a dramatic impact on telecommunication-level service of carrier networks such as NGN (Next Generation Network) and 3G (3rd Generation), the quality of network service is degraded, and telecommunication-level service is no longer secured.
Second, the permeation of enormous P2P applications at networks also puts network security at risk, such as the propagation of malicious software by using P2P application, and leakage of sensitive information.
Third, the extensive use of P2P means potential exposure to litigation, such as intellectual property disputes.
In view of the foregoing drawbacks, operators are compelled to manage and control P2P service.
The “traffic limiting+connection limiting” technique, the most management and control technique, is to limit the bandwidth and the number of connections. A result from limiting the bandwidth is a decrease in users' P2P download speed, and a result from limiting the number of connections is a decrease in the number of P2P connection users. Apparently both of them fulfill the purpose of limiting P2P traffic.
It is however impossible to fully solve the problem simply by blocking P2P, and a more reasonable and effective measure is to guide P2P service to reasonable use. Hence, the P2P cache technique emerges as the circumstance demands.
Largest That ISP Caches P2P Traffic (http://www.slyck.com/story1185_largest_thai_isp_caches_p2p_traffic), released by Thomas Mennecke on May 13, 2006, describes P2P caching in detail, the disclosure of which is incorporated by reference.
P2P caching enables ISPs to better bear P2P service on their networks by effectively managing the peaks and valleys associated with P2P usage. P2P caching frees up the network bandwidth, reducing the need for ISPs to purchase more bandwidth to meet increasing demands and reducing the need to limit P2P usage through byte caps, policies or traffic shaping. Therefore, P2P caching provides an improved experience for all users. Specifically, P2P users improve their file sharing by using P2P caching, and non-P2P users experience better network performance by being liberated from congestion caused by P2P traffic.
It is estimated that 4 out of 5 files requested via P2P can be served by P2P caching. This ratio is significantly higher than HTTP/Web caching. Hence, the utilization efficiency of P2P caching is also much higher than that of HTTP/Web caching, and in turn, the establishment of P2P caching can produce more incomes than that of HTTP/Web caching.
P2P caching involves creating a cache or temporary storage space for P2P data by using specialized communications hardware, disk storage and associated software. The created cache is placed in the ISP's network, either co-located with the Internet transit links or placed at key aggregation points or at each cable head-end.
FIG. 1 is a schematic block diagram of a P2P caching system in the prior art. The P2P caching system 100 shown in FIG. 1 employs existing inspection techniques such as DPI (Deep_Packet_Inspection), and routers 101-103, clients 104-106, and a P2P cache 107 are all known in the prior art.
Take clients 104 and 105 for example. As described above, client 104 may be directly connected to client 105 for a file exchange where P2P cache 107 is not established.
Once P2P cache 107 is established, the network will transparently direct P2P traffic to P2P cache 107. P2P cache 107 either provides file services to client 104 directly or passes the request on to a remote P2P user (for example, client 105) and simultaneously caches that data for the next user (for example, client 106).
As described above, P2P cache 107 usually works in conjunction with the network traffic inspection and control technique called DPI. ISPs utilize DPI to learn what traffic is running across their networks, separate and treat the traffic for the most efficient delivery.
DPI technology is based on traffic inspection and control at the application layer. When an IP (Internet Protocol) data packet, a TCP (Transmission Control Protocol) or UDP (User Datagram Protocol) data stream flows through a DPI-based bandwidth management system, the system reorganizes application layer information in the OSI (Open System Interconnection) 7-Layer Protocol by deep-reading content of IP packet payload, obtains content of the entire file and thus shapes traffic according to a management policy defined in the system. The web page http://en.wikipedia.org/wiki/Deep_packet_inspection provides a detailed description of DPI, the disclosure of which is incorporated here by reference.
As DPI needs to deep-read content of IP packet payload and obtain content of the entire file, it is a high cost operation, and P2P streams cannot be monitored and identified effectively by current P2P caching solutions with DPI. In addition, because the inspection information of the entire file needs to be stored, DPI requires a huge storage capacity.
Therefore, there is a need for a more effective and lower cost solution to monitor and identify P2P streams.