The Internet is by far the largest, most extensive publicly available network of interconnected computer networks that transmit data by packet switching using a standardized Internet Protocol (IP) and many other protocols. The Internet has become an extremely popular source of virtually all kinds of information. Increasingly sophisticated computers, software, and networking technology have made Internet access relatively straightforward for end users. Applications such as electronic mail, online chat and web client allow the users to access and exchange information almost instantaneously.
The World Wide Web (WWW) is one of the most popular means used for retrieving information over the Internet. The WWW can cope with many types of data which may be stored on computers, and is used with an Internet connection and a Web client. The WWW is made up of millions of interconnected pages or documents which can be displayed on a computer or other interface. Each page may have connections to other pages which may be stored on any computer connected to the Internet. Uniform Resource Identifiers (URI) is an identifying system in WWW, and typically consists of three parts: the transfer format (also known as the protocol type), the host name of the machine which holds the file (may also be referred to as the web server name) and the path name to the file. URIs are also referred as Universal Resource Locators (URLs). The transfer format for standard web pages is Hypertext Transfer Protocol (HTTP). Hyper Text Markup Language (HTML) is a method of encoding the information so it can be displayed on a variety of devices.
Web applications are engines that create Web pages from application logic, stored data, and user input. Web applications often preserve user session state. Web applications make use of standard Web browser components to view server-side built pages. Web application can also deliver services through programmatic interface like Software Development Kits (SDKs).
HTTP is the underlying transactional protocol for transferring files (text, graphic images, sound, video, and other multimedia files) between web clients and servers. HTTP defines how messages are formatted and transmitted, and what actions web servers and web client browsers should take in response to various commands. A web browser as an HTTP client, typically initiates a request by establishing a TCP/IP connection to a particular port on a remote host. An HTTP server monitoring that port waits for the client to send a request string. Upon receiving the request string (and message, if any), the server may complete the protocol by sending back a response string, and a message of its own, in the form of the requested file, an error message, or any other information. Web pages regularly reference to pages on other servers, whose selection will elicit additional transfer requests. When the browser user enters file requests by either “opening” a web file by typing in a Uniform Resource Locator (URL), or clicking on a hypertext link, the browser builds an HTTP request. In actual applications, web clients may need to be distinguished and authenticated, or a session which holds a state across a plurality of HTTP protocols may need to be maintained by using “state” called cookie.
Web applications process HTTP request from users. The processing of HTTP requests by a Web application, involves handling user data within the Web application and performing operations on it. Because of the nature of computer systems, processing user data within a Web application can result in a break of the normal behavior of the computer system. Some of the computer system breaks can be exploited to trigger functionality outside of the Web application, or to make the Web application perform operations that a user is not entitled. A well-known computer system break, often exploited by malicious users, is buffer overflows. A buffer overflow occurs when data allows a user to run instructions of a computer that are outside the scope of the application he is using. Buffer overflows can give malicious users control of the computer system in which a Web application is running. The buffer overflow attack can be prevented if the incoming data is examined to ensure that it does not exceed a given size. On the other hand, failing to examine incoming data by a Web application can become an attack vector for malicious users. More information on these types of attacks may be found in the following articles from CERT®: “Understanding Malicious Content Mitigation for Web Developers”, CERT Coordination Center, February 2000, http://www.cert.org/tech_tips/m-alicious_code_mitigation.html and http://www.cert.org/tech_tips/malicious_-code_FAQ.html; and “Malicious HTML Tags Embedded in Client Web Requests”, CERT Coordination Center, February 2000, http://www.cert.org/advisories/CA-20-00-02.html), both documents are hereby incorporated by reference in their entirety.
There are various network level firewall technologies available, such as intrusion detection systems, to protect computer systems against malicious data. These firewalls use state tables and data patterns to filter network input. Because they are independent of Web applications and their validation capabilities are specialized for the network layer, they are generally inadequate to address the custom validation needs of complex Web applications.
A common strategy for protecting Web applications against malicious data is for Web applications to verify the data they receive prior to processing it. The act of checking data entering a Web application for processing is called input validation. Input validation consists of accepting only data deemed acceptable to a Web application, or rejecting data that could be offensive to the Web application. So as to not reject legitimate data, the input validation process requires a great deal of knowledge about the application behavior. However, software developers tend to be focused on producing functional code rather than input verification code. The result may be inconsistency in performing input validation tasks in various applications.
In addition, application software developers may not be well positioned to write their code so that it filters incoming data to ensure that such data is valid and legal. It may be unrealistic to expect the developers to know every possible form of attacks. Furthermore, new attacks lead to new requirements for input validation. Therefore, it is prudent practice to have a mechanism for performing additional validation checks in addition to the internal checks.
Some Web application frameworks offer input validation capabilities. For example, the Apache STRUTS Web application validation framework. The STRUTS framework uses a list of input validation rules.
Web applications can perform input validation themselves either in a centralized location or where the data is used. In this scenario, the input validation rules are embedded within the Web application. Examples of the validation capabilities offered in prior art validation engines are summarized by the STRUTS validation documentation at http://struts.apache.org/userGuide/dev_validator.html, which is incorporated by reference in its entirety. Example for STRUTS are: “required”, used for mandatory field validation; “requiredif”, a field dependant validator; “validwhen”, a validator for checking one field against another; “minlength”, is used to validate input data isn't less than a specified minimum length; “maxlength”, is used to validate input data doesn't exceed a specified maximum length; “mask”, is used to validate format according to a regular expression; “byte” is used to validate that a field can be converted to a byte; “short”, used to validates that a field can be converted to a Short; “integer” is used to validates that a field can be converted to an Integer; “long”, validates that a field can be converted to a Long; “float”, validates that a field can be converted to a Float; “double” validates that a field can be converted to a Double; “date” validates that a field can be converted to a Date; “intRange” validates that an integer field is within a specified range; “floatRange” validates that a float field is within a specified range; “creditCard” is used to validate credit card number format; “email” is used to validate email address format; and “url” validates url format.
Although not an exhaustive list, above list reflects the validation capabilities available in validation engines. Custom validation code needs to be written if the validation needed is not provided by existing capabilities.
The advantage for using built-in validation capabilities from the validation engine instead of custom validation for parameters is the effectiveness to build large set of rules. Conversely, using custom validation would require duplicated validation logic for the Web application parameters to be validated, which may be numbered in thousands for a typical business Web application. In addition, Web application data values can change over the life of a Web application.
Therefore, a common difficulty encountered when writing validation rules for a Web application is that complex application data values often do not fall within the constraints of the current type based or regular expression rules, for example, as listed in the STRUTS framework. New security requirements also call for unusual validation outside the scope of traditional rule capabilities. To overcome this problem, custom code (for example in Javascript, C++, or Java) may be needed to validate values which cannot be handled by the existing validation engine capabilities. Writing and maintaining custom validation code is not efficient. Since Web application data validation logic is repetitive, the advantage of pre-defined rule types may be lost. Custom validation require a greater level of expertise from the rules writer (knowledge of code programming).
Another disadvantage of code driven rules is that once an application is deployed in an environment, policies will often prevent modifications to the installed code.
US Application 20030037236 teaches a technology for automated input validation filters generation to allow a user external to the Web application to easily define validation filters.
US Application 20030037236 does not teach the broadening of the validation capabilities of the input engine to perform additional validation. In addition, the relations used in defining assumptions on parameters follow the traditional input validation model as described by the list of validation types in the STRUTS framework. The inclusion of conjunctions and disjunctions is not sufficient to create the validation rules. Capabilities to ease manual writing of rules are introduced as manual writing of rules is undesirable. US Application 20030037236 does not give the rule writers with intimate knowledge of the Web application who seek to achieve the most secure validation the capabilities to address complex Web applications validation requirements as encountered in Business Intelligence Web applications.
US Application 20040189708 teaches a system and method validating entry of data into a structured data file in real-time. The system and method also described a real-time validation tool that enables a developer to create custom validation rules. These custom validation rules can include preset validation rules. The system and method validates data as to be safely stored in hierarchical structures thus easing the user experience by not generating misleading errors. However, US Application 20040189708 does not introduce new validation capabilities to validate input data against malicious users trying to exploit security vulnerabilities, it only provides a list of preset validation rules matching a sub-set of the STRUTs framework list. These preset validation rules and the custom rules failed to address the validation requirements of complex Web Applications like business intelligence Web applications. Furthermore, an objective of US Application 20040189708 is to report details about validation failures to the user which would be useful to a malicious user.
Therefore, there is a need for a richer and yet simple to define rules applied by a validation engine. The rules capabilities allow tight validation of complex Web application data without the need for customized validation code. There is a need for the rules syntax to be adapted for human handling, either by using human readable rule definitions, or by manipulating a tool. There is a need for the rules syntax to help write, to verify correctness, to ensure completeness, and to facilitate updates of the rules. There is a need for a prompt fix when a security vulnerability is newly discovered, a rules upgrade is preferable than a code upgrade. The update of validation rules is flexible and quick to implement.