1. Technical Field
The present invention relates generally to network communications over TCP/IP and more particularly to connecting low bandwidth services between local area networks (LANs) and ameliorating packet fragmentation.
2. Description of Related Art
It is known that virtual private networks (VPN) allow remote employees access to an enterprise's information systems. VPNs are used to connect remote offices to headquarters for time critical enterprise resource management operations.
The communication network typically comprises a public network (e.g., the Internet). The connections to the communication network from the branch office and the central office typically cause a bandwidth bottleneck for exchanging the data over the communication network. The exchange of the data between the branch office and the central office, in the aggregate, will usually be limited to the bandwidth of the slowest link in the communication network aggravated by the latency imposed by encryption and decryption of the VPN overhead.
For example, the router connects to the communication network by a T1 line, which provides a bandwidth of approximately 1.544 Megabits/second (Mbps). The router 170 connects to the communication network by a T3 line, which provides a bandwidth of approximately 45 Megabits/second (Mbps). Even though the communication network may provide an internal bandwidth greater than 1.544 Mbps or 45 Mbps, the available bandwidth between the branch office and the central office is limited to the bandwidth of 1.544 Mbps (i.e., the T1 connection).
Moreover, many applications do not perform well over the communication network due to the limited available bandwidth. Developers generally optimize the applications for performance over a local area network (LAN) which typically provides a bandwidth between 10 Mbps to Gigabit/second (Gbps) speeds. The developers of the applications assume small latency and high bandwidth across the LAN between the applications and the data. However, the latency across the communication network typically will be 100 times that across the LAN, and the bandwidth of the communication network will be 1/100th of the LAN.
Connecting a branch office to headquarters is likely to involve tying two local area networks to routers which are connected by a wide area network. This requires traversing a number of gateways controlled by different parties. The maximum packet size (also called the MTU, or Maximum Transmission Unit) and default packet size can vary depending on the media. For ethernet (LAN), the max packet size is 1500 octets. For token ring and FDDI, it is 4096 octets. The IP protocol was designed for use on a wide variety of transmission links. Although the maximum length of an IP datagram is 64K, most transmission links enforce a smaller maximum packet length limit, called an MTU. The value of an MTU depends on the type of the transmission link.
The design of IP accommodates MTU differences by allowing routers to fragment IP datagrams as necessary. The receiving station is responsible for reassembling the fragments back into the original full size IP datagram. As the IP packets are routed independently of each other, different packets between the same end hosts could take different routes with varying MTU sizes. However, the lack of end-to-end information can quickly result in oversized packets being received by the intermediate routers that have to route them somehow.
The IP protocol provides a convenient solution: the IP fragmentation, a mechanism where a single inbound IP datagram is split into two or more outbound IP datagrams. The worst impact of IP fragmentation is in the router-to-router communication. If a router-to-router IP packet is fragmented somewhere in the path, the receiving router has to reassemble the original packet, resulting in significantly reduced switching performance.
An additional problem with deployment of VPNs is that there is latency introduced by the encryption and decryption of transmissions. Because of the encryption of traffic, the same files transmitted twice will not look the same and this prevents conventional caching strategies.
For example, in a centralized server implementation having multiple branches, computers in each of the multiple branch offices make requests over the VPN to central servers for the organization's data. The data transmitted by the central servers in response to the requests quickly saturate the available bandwidth of the central office's connection to the communication network, further decreasing application performance and data access at the multiple branch offices. This is particularly troublesome for entities which span multiple timezones as congestion can dominate the work day.
It is also known that mechanisms for caching improve application performance and data access. A cache is generally used to reduce the latency of the communication network (e.g., communication network) forming the VPN (i.e., because the request is satisfied from the local cache) and to reduce network traffic over the VPN (i.e., because responses are local, the amount of bandwidth used is reduced).
Webpage caching, for example, is the caching of web documents (i.e., HTML pages, images, etc.) in order to reduce web site access times and bandwidth usage. Web caching typically stores local copies of the requested web documents. The web cache satisfies subsequent requests for the web documents if the requests meet certain predetermined conditions.
One problem with web caching is that the Time to Live parameter is generally not easily changed. Thus the management of a web cache is at least tricky and not conveniently purged or updated. Every browser can have a slightly different version of a document. Another problem is that the web cache stores entire objects (such as documents) and cache-hits are binary: either a perfect match or a miss. Even where only small changes are made to the documents, the web cache cannot use the cached copy of the documents to reduce network traffic.
It is also known that randomly chosen polynomials are used to “fingerprint” bit-strings. This method, first published by Michael O. Rabin Center for Research in Computing Technology Harvard University Report TR-15-81 (1981), is applied to produce a very simple string matching algorithm and a procedure for securing files against unauthorized changes. The method is provably efficient and highly reliable. However it is also known that the Rabin fingerprinting scheme is not as secure as more expensive cryptographic hash functions.
It is known that the Rabin-Karp algorithm is a string searching algorithm created by Michael O. Rabin and Richard M. Karp in 1987 that uses hashing to find a substring in a text. It is used for multiple pattern matching rather than single pattern matching. Running time performance is considered a reason that it is not widely used. However, it has the advantage of being able to find any one of kstrings or less in a predictable time regardless of the magnitude of k.