1. Technical Field
The invention relates to the field of communication. More specifically, the invention relates to communication networks.
2. Background
Hypertext transfer protocol (HTTP) is a resource access protocol. A resource access protocol is a defined set of rules for retrieval of resources from the Internet. A resource can be an image, a hypertext markup language (HTML) page, a Java applet, program, etc. HTTP is considered to reside at the presentation layer and/or the application layer of the OSI reference model. HTTP provides guidelines for exchanges between clients and resource hosts including request and response messages. A typical HTTP exchange includes a client requesting a resource and a resource host responding with the resource. In certain scenarios, the resource host will transmit a response that redirects the client to a different resource than the one originally requested by the client. For example, a resource host may not find a requesting client's cookie in the resource host's database, and, as a result, sends a response to the requesting client that redirects the client to a login page.
Since the exchanges between clients and resource hosts often include sensitive information, security measures are applied to certain exchanges. For security, HTTP is coupled with the Secure Sockets Layer (SSL) (also known as Transport Layer Security (TLS)). From the perspective of the OSI reference model, HTTP sits over SSL. This coupling is referred to as HTTPS. After HTTP has generated a message, HTTP passes the message to SSL, which performs security operations (e.g., encryption, hashing, etc.) on the message.
HTTP uses a uniform resource locator (URL) for retrieval of a resource. A URL is an address of a resource accessible on the Internet. A URL includes a resource access protocol identifier, a resource host identifier, a path identifier, and a resource identifier. In the URL “http://www.host.com/folder/main.html,” the resource access protocol identifier is “http”; the resource host identifier is “www.host.com”; the path identifier is “folder”; and the resource identifier is “main.html.”
The resource access protocol identifies HTTP as the resource access protocol to be used to retrieve the identified resource.
The resource host indicated by the resource host identifier is a resource host, or server, identified as “www.host.com.” Although the resource host identifier used above is a domain name, a resource host identifier may be a network address, such as an Internet Protocol (IP) address. A resource host identifier may identify a port in addition to a resource host. For example, the following two URLs identify the same resource, but the second indicates a port:                1) http://www.host.com/main.html        2) http://www.host.com:80/main.html.        
The indicated port is the appropriate port for communication with the identified resource host in accordance with the identified resource access protocol. The default port for HTTP is port 80 (the default port for HTTPS is port 443), so an HTTP message with the above example URL will be communicated to the resource host identified as “www.host.com” with the port 80.
A URL does not necessarily have to include a path identifier or a resource identifier because the resource may be in a default path and have a default name. Using the previous examples, the URL “http://www.host.com/” identifies the same resource as the previous example URLs, assuming that “folder” is the default path and that “main.html” is the default resource.
The HTTP protocol and HTTPS protocol were designed such that the response (including a redirect) to a request will use the same protocol as the request used. Thus, if the request used HTTP, then the URLs of the response will use HTTP. In contrast, if the request used HTTPS, then the URLs of the response will use HTTPS. While this works for many situations, it creates problems in certain environments.
FIG. 1 (Prior Art) is a diagram of an example redirect response retransmission. In FIG. 1, a first, second, and third columns indicate resource messages respectively transmitted and received by a client 101, a content switch 103 that performs HTTP proxy, and a server 105. A hyperlink with a URL “https://www.host.com/res1.htm” is activated at the client 101, typically a personal computer. The resource host identifier is resolved to an Internet Protocol (IP) address and an HTTPS session is opened with the content switch 103. After the HTTPS session is opened, a request message 107 with “GET res1.htm” is transmitted from the client 101 to the content switch 103. The content switch 103 receives the request 107 on port 443, decrypts the request 107, and forwards the decrypted request 107 over an HTTP session to the server 105.
The content switch 103 that performs HTTP proxy and the server 105 are typically network elements in the same local area network (LAN), which is separate from the client 101. The client 101 communicates with the LAN over a public network (e.g., the Internet). The server 105 is one of many servers in a server farm. The server 105 and the other servers in the server farm are not burdened with security measures since the owner of the server farm and content switch 103 relies on the content switch 103 for security. The content switch 103 is exposed to the outside world and protects the server farm by performing HTTP proxy. The owner has dedicated resources of the servers in the server farm, including the server 105, to serving of requests instead of performing security operations. The content switch 103 performs HTTP proxy for the servers in the server farm and determines the appropriate server for a received request. In FIG. 1A, the server 105 is the appropriate server for the request 107.
The server 105 transmits a response 111 with redirect URL “http://www.host.com/res2.htm” to the content switch 103. The content switch 103 encrypts the response 111 and transmits the encrypted response 111 back over the HTTPS session to the client 101. The client 101 receives the HTTPS response 111, decrypts the response 111, and closes the HTTPS session. Assuming the redirect URL is selected, the client 101 resolves the host name and opens an HTTP session with the content switch 103 in accordance with the resource access protocol indicated by the redirect URL. The client 101 transmits a request message 113 with “GET res2.htm” to the content switch 103.
The content switch 103 receives the request message 113 on the port 80 because the content switch is running a network service to listen for traffic received on port 80. Traffic received on port 80 is redirected. In response to the request 113, the content switch 103 generates a response message 119 that indicates a redirect URL “https://www.host.com/res2.htm”. The content switch 103 transmits the response 119 back to the client 101 over the HTTP session initially opened by the client 101.
The client 101 closes the HTTP session and opens a HTTPS session with the content switch 103. The client 101 generates a request message 121, encrypts the request message 121, and transmits the encrypted request message 121 to the content switch 103.
This redirect retransmission punches a hole in the security provided by HTTPS. Since the client switches to HTTP, the data transmitted from the client is unencrypted. It is assumed that the client is transmitting sensitive information (e.g., a credit card number, passwords, bank account numbers, residential address, phone numbers, etc.) since HTTPS is typically invoked for protecting communications that will most likely include sensitive information. Due to the redirect rewrite retransmission, the client is transmitting sensitive data without encryption, which can be captured and used with ease.
In addition, the number of exchanges taking place between the client 101 and the content switch 103 illustrated in FIG. 1 are unnecessary “extra” exchanges to force the client to transmit an acceptable request, which can cause substantial impact to the content switch's performance in the real world. These extra exchanges between a content switch and thousands of sessions for thousands of clients impact performance of the content switch and introduce latency in a client's wait time for a response. Both the client machine and the content switch must process an additional message per redirect. The additional redirect may also agitate users and deter them from returning to a website.
FIG. 2 (Prior Art) is an example of a client in a protected network receiving a response message with a URL that indicates HTTPS. In FIG. 2, four columns respectively indicate a client 201, an intrusion detection system (IDS) 202, a firewall 203 with a proxy, and a server 205. The firewall 203 connects external networks to the client 201. The IDS 202 sniffs traffic transmitted and received by the client 201. The server 205 transmits a response 207 with a URL “https://www.host.com/res1.htm” to the firewall 203. The firewall 203 receives the response 207 and forwards the response 207 to the client 201 (assuming the client 201 was the requesting client). If the firewall 203 has a HTTPS session open with the server 205, then the firewall 203 decrypts the response 207 before forwarding it to the client 201. The IDS 202 analyzes the response 207. Assuming the URL in the response 207 is selected, the client 201 resolves the host name of the URL and opens a HTTPS session with the firewall 203 in accordance with the URL. If the client 151 does not support SSL (e.g., corporate entities that wish to snoop traffic with an IDS, sometimes do not enable SSL on their client machines so that their traffic can be snooped by the IDS), then the client 201 cannot request the resource. The client 201 generates a request message 209, encrypts the request 209, and transmits the request 209 to the firewall 203. Since the request 209 is encrypted, the IDS 202 must be capable of decrypting the request 209 to analyze it, allow the request to pass without analysis, or hold the request. If the IDS 202 allows the request to pass, then the firewall 203 opens a HTTPS session with the server 205 and transmits the request 209 to the server 205. Hence, the request 209 has bypassed the IDS 202.
This security architecture is typically employed in a corporate environment. A corporate entity needs to protect its systems from being infected and/or prevent access to its systems by external and/or internal malignant elements while still enabling its employees to access resource beyond its local area network. The corporate entity also needs to control the types of resources or material that enters its network at the request of its employees. Therefore, a corporate entity employs both a firewall with proxy support and an intrusion detection system to protect its network from external hacking and internal violations of its computer use policy. Unfortunately, as shown in FIG. 2, an IDS can be bypassed with encryption.
Service provides also used another mechanism with a security flaw to accommodate users. In order to avoid agitating users with error messages and increasing latency, service providers allowed a pass through for messages that were not encrypted. For this mechanism, a content switch is configured to listen for traffic on both ports 80 and 443. Traffic received on port 80 is forwarded to the corresponding servers while traffic received on port 443 is decrypted. Hence, users are not inconvenienced with error messages and increased latency, but users were possibly transmitting sensitive information without encryption.