The Internet is by far the largest, most extensive publicly available network of interconnected computer networks that transmit data by packet switching using a standardized Internet Protocol (IP) and many other protocols. The Internet has become an extremely popular source of virtually all kinds of information. Increasingly sophisticated computers, software, and networking technology have made Internet access relatively straightforward for end users. Applications such as electronic mail, online chat and web client allow the users to access and exchange information almost instantaneously.
The World Wide Web (WWW) is one of the most popular means used for retrieving information over the Internet. The WWW can cope with many types of data which may be stored on computers, and is used with an Internet connection and a Web client. The WWW is made up of millions of interconnected pages or documents which can be displayed on a computer or other interface. Each page may have connections to other pages which may be stored on any computer connected to the Internet. Uniform Resource Identifiers (URI) is an identifying system in WWW, and typically consists of three parts: the transfer format (also known as the protocol type), the host name of the machine which holds the file (may also be referred to as the web server name) and the path name to the file. URIs are also referred as Universal Resource Locators (URLs). The transfer format for standard web pages is Hypertext Transfer Protocol (HTTP). Hyper Text Markup Language (HTML) is a method of encoding the information so it can be displayed on a variety of devices.
Web applications are engines that create Web pages from application logic, stored data, and user Input. Web applications often preserve user state across sessions. Web applications do not require software to be installed in the client environment. Web applications make use of standard Web browser components to view server-side built pages. Web application can also deliver services through programmatic interface like Software Development Kits (SDKs).
HTTP is the underlying transactional protocol for transferring files (text, graphic images, sound, video, and other multimedia files) between web clients and servers. HTTP defines how messages are formatted and transmitted, and what actions web servers and web client browsers should take in response to various commands. A web browser as an HTTP client, typically initiates a request by establishing a TCP/IP connection to a particular port on a remote host. An HTTP server monitoring that port waits for the client to send a request string. Upon receiving the request string (and message, if any), the server may complete the protocol by sending back a response string, and a message of its own, in the form of the requested file, an error message, or any other information. The HTTP server can take the form of a Web server with gateway components to process requests. A gateway is a custom web server module or plug-in created to process requests, and generally is the first point of contact for a web application. The term “gateway” is intended to include any gateways known to a person skilled in the art, for example, CGI; ISAPI for the Microsoft Internet Information Services (IIS) web server; Apache web server module, or a Java servlet.
Web pages regularly reference to pages on other servers, whose selection will elicit additional transfer requests. When the browser user enters file requests by either “opening” a Web file by typing in a Uniform Resource Locator (URL), or clicking on a hypertext link, the browser builds an HTTP request. In actual applications, Web clients may need to be distinguished and authenticated, or a session which holds a state across a plurality of HTTP protocols may need to be maintained by using “state” called cookie.
Web applications incur a security risk by accepting user input in their application logic. To reduce this risk, security filters can be installed at the entry point of Web applications. Security filters typically examine incoming request, apply generic security rules, and reject requests that fail to comply with these rules. A security rule can for example reject HTTP GET requests containing any of the following characters: greater than sign (<), the single quote (‘), or the less than sign (>). Security filters are not tied to a specific Web application. If a corporate policy requires a security filter to be in place, it becomes the Web application responsibility to function in conjunction with the security filter rules. It can be difficult for complex and refined Web applications to meet the security filter rules without significant re-architecture. Typical installations will have many Web applications guarded by one security filter.
Therefore, there is a need for a method and apparatus that allows complex Web applications to function normally in an environment where a security filter has generic security rules enforced. Because the security filter can have a global scope and be required by policy, the method and apparatus cannot modify the security filter behavior. To be of value, the solution has to minimize the amount of changes to the Web application. Furthermore, the method and apparatus should respect the spirit of the security filter requirement policy by not simply offering a total bypass of the security filter.
A common approach to modify data exchanged between components is to insert a proxy between them that monitors communications both ways. The proxy can examine the data flow and modify it according to its own logic. Proxies can operate at the HTTP protocol level or as add-ons to applications. For example, the Java Servlet framework allows requests and responses to be sent through a Servlet proxy using the built-in chaining mechanism, or at the HTTP level, proxies can listen on a port and redirect traffic to a different port.
If a Web application creates requests with data that would trigger a security filter generic rule, it either has to stop sending that data, or transformed it as to not trigger rejection. Not sending the data can require major re-architecture of the Web application. Therefore, there is a need for transformation of the data through a proxy, because it can be accomplished with minimal architecture impact. A transformation that wraps the Web application data so as to preserve the original information but not trigger rejection can be referenced as a cloak operation. An analogy in the real world would be for a human (the data) to wear an enveloping cloak to pass unrecognized in front of a guard (security filter). Security filters are not commonly able to detect such subterfuge.