1. Technical Field of the Invention
The present invention relates in general to Web application security. The invention provides a mean to protect a Web application from hacker attacks. Thus the present invention is a Web Application Firewall (WAF). The invention makes use of several Artificial Intelligence (AI) techniques.
2. Description of the Related Art
HyperText Transfer Protocol
HyperText Transfer Protocol (HTTP) is the primary method used to convey information on the World Wide Web (WWW). The original purpose was to provide a way to publish and receive HyperText Markup Language (HTML) pages. HTML is a markup language designed for the creation of web pages and other information viewable in a browser.
Development of HTTP was coordinated by the World Wide Web Consortium and working groups of the Internet Engineering Task Force, culminating in the publication of a series of RFCs, most notably RFC 2616, which defines HTTP/1.1, the version of HTTP in common use today.
Like most network protocols, HTTP uses the client-server model; An HTTP client, such as a web browser, typically initiates a request by establishing a TCP connection and sending a request message to a particular port on a remote server. The server then returns a response message, usually containing the resource that was requested. After delivering the response, the server closes the connection (making HTTP a stateless protocol, i.e. not maintaining any connection information between transactions). In that point, HTTP differs from other TCP-based protocols such as FTP. This design makes HTTP ideal for the World Wide Web, where pages regularly link to pages on other servers. It can occasionally pose problems, as the lack of a persistent connection necessitates alternative methods of maintaining users' “state”. Many of these methods involve the use of “cookies”, but this is often not sufficient from a security point of view.
HTTP Cookie
An HTTP cookie (usually called simply a cookie) is a packet of information sent by a server to a WWW browser and then sent back by the browser each time it accesses that server. Cookies can contain any arbitrary information the server chooses and are used to maintain state between otherwise stateless HTTP transactions. Typically this is used to authenticate or identify a registered user of a web site as part of their first login process or initial site registration without requiring them to sign in every time they access that site.
Structure of HTTP Transactions
The format of the request and the format of response messages are similar, and English-oriented. Both kinds of messages consist of:                An initial line (different for request vs. response);        Zero or more header lines;        A blank line (i.e. a CRLF by itself);        An optional message body (e.g. a file, or query data, or query output).Initial Request Line        
The initial line is different for the request than for the response. A request line has three parts, separated by spaces: A method name, the local path of the requested resource, and the version of HTTP being used. A typical request line is:                GET/path/to/file/index.html HTTP/1.1        
The path is the part of the URL after the host name, also called the request URI (a URI is like a URL, but more general). The most common HTTP request methods are:    GET is by far the most common HTTP method, for statically requesting a resource by specifying a URL. It says “give me this resource”    POST Similar to GET, except that a message body, typically containing key-value pairs from an HTML form submission, is included in the request;    PUT Used for uploading files to a specified URI on a web-server;    HEAD Identical to GET, except that the page content is not returned; just the headers are. Useful for retrieving meta-information.Initial Response Line
The initial response line, called the status line, also has three parts separated by spaces: the HTTP version, a response status code that gives the result of the request, and an English reason phrase describing the status code. Typical status lines are:                HTTP/1.1 200 OK                    or                        HTTP/1.1 404 Not Found        
The status code is meant to be computer-readable; the reason phrase is meant to be human-readable, and may vary. The status code is a three-digit integer, and the first digit identifies the general category of response. The most common status codes are:    200 OK The request succeeded, and the resulting resource (e.g. file or script output) is returned in the message body;    404 Not Found The requested resource doesn't exist;    302 Moved Temporarily redirects the client to another URL;    500 Server Error An unexpected server error. The most common cause is a server-side script that has bad syntax, fails, or otherwise cannot run correctly.Header Lines
Header lines provide information about the request or response, or about the object sent in the message body.
The header lines are in the usual text header format, which is: one line per header, of the form “Header-Name: value”, ending with CRLF. The format is defined in RFC 822, section 3 (same format as for email and news postings). HTTP 1.0 defines 16 headers, though none are required. HTTP 1.1 defines 46 headers, and one (Host:) is required in requests. For Net-politeness, the following headers are often included in requests:    From This header gives the email address of whoever's making the request, or running the program doing so (user-configurable, for privacy concerns);    User-Agent This header identifies the program that is making the request, in the form “Program-name/x.xx”, where x.xx is the (mostly) alphanumeric version of the program.    Referer This header contains the URL of the document from which the request originated.
The following headers are often included in responses:    Server This header is analogous to the User-Agent: header: it identifies the server software in the form “Program-name/x.xx”. For example, one beta version of Apache's server returns “Server: Apache/1.3b3-dev”;    Last-Modified This header gives the modification date of the resource that's being returned. Used in caching and other bandwidth-saving activities.The Message Body
An HTTP message may have a body of data sent after the header lines. In a response, this is where the requested resource is returned to the client (the most common use of the message body), or perhaps explanatory text if there's an error. In a request, this is where user-entered data or uploaded files are sent to the server.
If an HTTP message includes a body, there are usually header lines in the message that describe the body. In particular:    Content-Type This header gives the MIME-type of the data in the body, such as text/html or image/gif;    Content-Length This header gives the number of bytes in the body.Secure HTTP
HTTPS is the secure version of HTTP, using SSL/TLS to protect the traffic. The protocol normally uses TCP port 443. SSL, originally created to protect HTTP, is especially suited for HTTP since it can provide (some) protection even if only one side to the communication, the server, is authenticated.
Man in the Middle Attack
A man in the middle attack (MITM) is an attack in which an attacker is able to read, insert and modify at will, messages between two parties without either party knowing that the link between them has been compromised. Even with the use of HTTPS, an attacker may be able to observe and intercept messages going between the two victims. In particular, this will be the case if the attacker is able to fool the client (e.g. victim's browser) into connecting to him rather than the requested server. The attacker then connects to the server on behalf of the victim, and effectively sits between the communicating parties, passing messages back and forth. He plays the role of the server on one side, and the client on the other.
Phishing Attack
Phishing is the act of attempting to fraudulently acquire sensitive information (e.g. credit card numbers, account user-names, passwords, social security numbers) by masquerading as a trustworthy person or company. Phishing attacks use both social engineering and technical subterfuge. Social-engineering schemes use spoofed e-mails to lead consumers to counterfeit websites designed to trick recipients into divulging sensitive information (i.e. the victim thinks to be connected to a trustworthy server). Hijacking brand names of banks, e-retailers and credit card companies, phishers often convince recipients to connect to their counterfeit websites. The following techniques are often used to hijack original brand names: Use of the “@” symbol in a URL, for example http://www.mybank.com@members.attacker.com/. Even if the first part of the link looks legitimate, this address will attempt to connect as a user www.mybank.com to the server members.attacker.com. The same is true for misspelled URLs or sub-domains, for example http://www.mybank.com.attacker.net
Technical subterfuge schemes typically use DNS spoofing to misdirect users to fraudulent sites or proxy servers.