The statements in this section merely provide background information related to the present disclosure and may not constitute prior art.
The Internet is by far the largest, most extensive publicly available network of interconnected computer networks that transmit data by packet switching using a standardized Internet Protocol (IP) and many other protocols. The Internet has become an extremely popular source of virtually all kinds of information. Increasingly sophisticated computers, software, and networking technology have made Internet access relatively straightforward for end users. Applications such as electronic mail, online chat and Web client allow the users to access and exchange information almost instantaneously.
The World Wide Web (WWW) is one of the most popular means used for retrieving information over the Internet. The WWW can cope with many types of data which may be stored on computers, and is used with an Internet connection and a Web client. The WWW is made up of millions of interconnected pages or documents which can be displayed on a computer or other interface. Each page may have connections to other pages which may be stored on any computer connected to the Internet. Uniform Resource Identifiers (URI) is an identifying system in WWW, and typically consists of three parts: the transfer format (also known as the protocol type), the host name of the machine which holds the file (may also be referred to as the Web server name) and the path name to the file. URIs are also referred as Universal Resource Locators (URLs). The transfer format for standard Web pages is Hypertext Transfer Protocol (HTTP). Hyper Text Markup Language (HTML) is a method of encoding the information so it can be displayed on a variety of devices.
Web applications are engines that create Web pages from application logic, stored data, and user input. Web applications often preserve user state across sessions. Web applications do not require software to be installed in the client environment. Web applications make use of standard Web browser components to view server-side built pages. Web application can also deliver services through programmatic interface like Software Development Kits (SDKs).
HTTP is generally the underlying transactional protocol for transferring files (text, graphic images, sound, video, and other multimedia files) between Web clients and servers. HTTP defines how messages are formatted and transmitted, and what actions Web servers and Web client browsers should take in response to various commands. A Web browser as an HTTP client, typically initiates a request by establishing a TCP/IP connection to a particular port on a remote host. An HTTP server monitoring that port waits for the client to send a request string. Upon receiving the request string (and message, if any), the server may complete the protocol by sending back a response string, and a message of its own, in the form of the requested file, an error message, or any other information. The HTTP server can take the form of a Web server with gateway components to process requests. A gateway is a custom Web server module or plug-in created to process requests, and generally is the first point of contact for a Web application. The term “gateway” is intended to include any gateways known to a person skilled in the art, for example, CGI; ISAPI for the Microsoft Internet Information Services (IIS) Web server; Apache Web server module, or a Java servlet.
Web pages regularly refer to pages on other servers, whose selection will elicit additional transfer requests. When the browser user enters file requests by either “opening” a Web file by typing in a Uniform Resource Locator (URL), or clicking on a hypertext link, the browser builds an HTTP request. In actual applications, Web clients may need to be distinguished and authenticated, or a session which holds a state across a plurality of HTTP protocols may need to be maintained by using “state” called cookie.
Web applications incur a security risk by accepting user input in their application logic. A common strategy for protecting Web applications against malicious data is for Web applications to verify the data they receive prior to processing it. The act of checking data entering a Web application for processing is called input validation. Web application entry point, for example, a Web application firewall typically examine incoming request, apply generic security rules, and reject requests that fail to comply with these rules. Input validation includes accepting only data deemed acceptable to a Web application, or rejecting data that could be offensive to the Web application. So as to not reject legitimate data, the input validation process requires a great deal of knowledge about the Web application behavior. Failure in doing so may impair the Web application's functionality. Further, when an entry point is shared by multiple Web applications, the validation logic implementation is required to account for applications having different validation logics for data in the same context. Similar requirement exists for an application composed of multiple components.
A method and system to build rich and yet simple to define rules applied by a validation engine has been described in U.S. application Ser. No. 11/187,268, titled “Rich Web Application Input Validation”, the entirety of which is hereby incorporated by reference. The capabilities of the rules allow tight validation of complex Web application data without the need for customized validation code. The syntax of the rules is adapted for human handling, either by using human readable rule definitions, or by manipulating a tool. The syntax of the rules helps to write, to verify correctness, to ensure completeness, and to facilitate updates of the rules.
Validation rules may be numbered in thousands for a large business Web application. One approach to simplify the rule set of the large Web applications is the dynamic generation of rules. For example, a Web application constructing a page with integer parameters can specify that the values for these parameters should be of type integer. The disadvantage of this approach is that the size of the validation rules may become an issue for applications with memory constraints. In other words, in extreme case it is not practical for the entry point to have knowledge of all the validation rules. Another limitation for the management of dynamically generated rules is the distributed nature of many applications. To handle large load of requests, entry points can be distributed onto several hosts. Maintaining the list of all validation rules on each entry point of a distributed system may not be optimal.
It is therefore desirable to provide stateless validation rules. Stateless is intended to indicate that the validation rules are sent to the client in a response then back to the server in a request, i.e., in a round trip instead of being stored server-side, for example at an entry point. The validation rule for data part of a request is also part of the request being validated. Impromptu components added to installed Web applications can also benefit from stateless validation. With a proper framework in place, the Web application can have an entry point validate their data without registering their rules to the entry point.
US Application 20030037236 teaches a technology for automated input validation filters generation to allow a user external to the Web application to easily define validation filters.
US Application 20030037236 does not teach the broadening of the validation capabilities of the input engine to perform validation based on the validation rules in the request. In addition, the relations used in defining assumptions on parameters follow the traditional input validation model as described by the list of validation types in the STRUTS framework. The inclusion of conjunctions and disjunctions is not sufficient to create the validation rules. Capabilities to ease manual writing of rules are introduced as manual writing of rules is undesirable. US Application 20030037236 does not give the rule writers with intimate knowledge of the Web application who seek to achieve the most secure validation the capabilities to address complex Web applications validation requirements as encountered in Business Intelligence Web applications.
US Application 20040189708 teaches a system and method for validating entry of data into a structured data file in real-time. The system and method also described a real-time validation tool that enables a developer to create custom validation rules. These custom validation rules can include preset validation rules. The system and method validates data as to be safely stored in hierarchical structures thus easing the user experience by not generating misleading errors. However, US Application 20040189708 does not introduce new validation capabilities to validate input data against malicious users trying to exploit security vulnerabilities, it only provides a list of preset validation rules matching a sub-set of the STRUTs framework list. These preset validation rules and the custom rules failed to address the validation requirements of complex Web Applications like business intelligence Web applications. More specifically, US Application 20040189708 does not validate input data against malicious users based on the validation rules embedded in the requests.
One of the benefits of embedding the validation rules in the response and the subsequent requests is the flexibility the Web applications have to validate a request. An application firewall can be used to process the embedded rules but a component can choose to bypass the application firewall and invoke the validation itself. As long as the data is validated before being processed, security is not compromised. Because data can go through transformations while being dispatched within an application, it may be easier to implement validation rules for data before being processed by a component of the application, because data ready to be processed is often in a simpler form.
Therefore, there is a need for a method and system that provide a stateless validation of the request. As the validation rules in a stateless validation are sent to an untrusted client, and used by the Web application upon return, the method and system need to ensure the authenticity, and the integrity of the validation rules. The term untrusted client is intended to include a client who may submit malicious requests to exploit an application security vulnerability. The authenticity check will enforce that the received validation rules come from a trusted server. The integrity check will verify that the validation rules have not been modified by the untrusted client.