1. Field of the Invention
The present invention relates generally to methods and apparatuses for reducing overhead on a proxied connection. More specifically, the invention relates to a cut through proxy that makes two separate connections and then modifies packets from one connection so that they may be transferred to another connection without the need to keep a TCP Transmission Control Block (TCB) for each connection. The cut through proxy can perform stateful inspection of the packets as they are transferred from one connection to the other.
2. Description of the Related Art
Proxies
In many network applications, it is often desirable or necessary to prevent a user from making a connection to a first machine at one IP address that has information that the user needs and instead service an information request with a second machine at a different IP address. For example, it is often desired from a security standpoint not to allow a connection to a machine that stores sensitive information. Instead, it may be required that a connection first be made to a proxy which has various security features such as user authentication and possibly encryption. The user requests the information from the proxy and the proxy establishes a connection with the machine that is being protected and obtains the information. If the proxy determines that the user is authorized to receive the information, the proxy can then relay the information to the user that requested it. The proxy thus stands in for the machine that stores the sensitive information. The user is prevented from making a direct connection to the protected machine. Instead, the user must first request the information from the proxy and only the proxy connects with the protected machine. The protected machine is insulated from potentially dangerous outside contact.
In a proxy arrangement that is used for security, the proxy generally first identifies and authenticates the user who is requesting information from a machine at a target IP address. In the discussion that follows, the user requesting information will be referred to as the client and the protected machine that is providing information will be referred to as the server. It should be noted that in certain situations the client and server designations may be reversed. The machine that is protected (in the example above, the server) is referred to as the proxied machine at the proxied address. The proxied machine is also referred to as the target machine at the target address because it is the machine that the client or user actually intends to access and obtain data or some other service. The target machine is distinguished from the proxy because the user does not generally desire to retrieve information from or contact the proxy other than for the purpose of authenticating himself or otherwise preparing for the desired connection with the target machine. The machine that acts as a proxy is called the proxy machine at the proxy address. The user making the connection is referred to as the user or the client. When a proxy is used, the user connects to the proxy machine at the proxy IP address and never actually makes a connection to the proxied machine at the proxied IP address.
Another example of a situation in which a proxy may be desirable is a web cache. A web cache is not necessarily implemented for the purpose of protecting another machine. It may be desirable to store certain information that is available from a primary web site at a first IP address at a web cache located at another IP address. In this situation, the user is directed to the IP address of the web cache for the information, and, if the information requested is not found in the cache, then the web cache connects to the IP address of the first web site, obtains the information, and then transfers it to the user.
Conventional Proxy Overhead
A conventional proxy terminates two separate connections: one with the client on one side and one with the server on the other side. Once information is received through the TCP stack on one side, the proxy application checks it to determine whether it is acceptable for sending to the other side. If the information is acceptable, then the proxy sends the information through the TCP stack on the other side and receives any responses via that TCP connection.
The proxy must have a significant amount of overhead processing power and memory devoted to maintaining a connection with the client and then storing information sent by the client so that it may be passed along to the server because a full TCP connection is terminated between the client and the proxy as well as between the server and the proxy. Memory overhead associated with a terminated TCP connection includes storage space for each packet or datagram that is sent as well as storage for incoming packets. It should be noted that in the following description the terms datagram and packet are used interchangeably to refer to messages or portions of messages sent to or from a network device. Each packet that is sent must be stored so that it may be resent if an error occurs or it is not timely acknowledged. Likewise, each packet that is received must be acknowledged and stored for reassembly into a message once the other packets in the message have been received.
It is also necessary to keep track of and update sequencing information. Packets are not generally received in the proper order and so the TCP protocol provides for sequence information to be included in each packet header so that received packets may be assembled in the proper order into a message.
The need for a proxy to terminate a connection both with the client and the server thus creates a large amount of processing overhead as well as memory requirements. Terminating the two connections further requires a large number of data copies which slow down the connections. Information is first copied from the physical layer to the IP layer which is contained in the system/kernel and then from system memory to user memory so that it is accessible to the proxy application. Data is then copied from user memory back to system memory so that the proxy can relay the packets to the target or server. All of these copies take a considerable amount of processing time and may cause the client to experience a slow connection.
The proxy must have a large memory and a large amount of processing capability in order to task switch among managing the various connections that it supports at any given time. Supporting the connections requires considerable overhead because each packet that is received by the proxy must be acknowledged. Each packet that is sent by the proxy must be stored and its state must be tracked so that the acknowledgment from the receiver of the packet, whether the client or the server, can be noted or a request from the client or the server to resend the packet can be noted and fulfilled. In addition, the proxy must calculate and store a check sum for each packet that is received to ensure the integrity of the data contained in the packet. Thus, the state of both the connections must be tracked and the actual data sent in packets must be repeatedly copied and stored in a conventional proxy.
The operation of a conventional proxy can be further understood by referring to the network layers implemented on the proxy. The basic problem of networking a set of devices has been divided into layers. The bottom layer is a physical layer. It handles the actual physical connections between devices. The second layer is the data link layer. It describes how the data is formatted which is on the physical medium which connects the devices. The third layer is the network layer. It handles cases where there is greater than one connection per machine. The fourth layer is the transport layer. This determines that all of the messages from a source reach the destination reliably and in an unduplicated fashion.
The second layer is subdivided into a Logical Link Control (“LLC”) layer and a Media Access Control (“MAC”) layer. A MAC address is required in this layer. In the TCP/IP suite of protocols employed on the Internet, the third layer or network layer is the IP layer. This layer requires a globally unique IP address in order to route packets to the right physical machine. Also, in TCP/IP, the fourth layer or transport layer is the TCP layer. The TCP layer additionally requires a machine port number so that the packet is sent to the correct port of a specific machine. The application layer sits on top of these layers, handling messages that have been assembled from packets by TCP. The orderly receipt and accuracy of the packets is also managed by TCP.
In a conventional proxy, security is implemented at the application layer. Thus, the problem with a standard proxy may be stated in terms of the need to pass data received at the physical layer from Ethernet or some other standard physical network implementation up through the IP layer and the TCP layer to the application layer. Processing data according to the protocol implemented at each layer requires a significant amount of memory as well as processing overhead for numerous data copies.
What is needed is a way to implement a proxy while reducing the overhead required to pass data among the various layers.