1. Technical Field
This disclosure relates generally to web application security and in particular to a method and system for providing runtime content sanitization.
2. Background of the Related Art
Ensuring that modern software systems are free of security vulnerabilities is a daunting task. Such systems often comprise large amounts of code, including third party and remote components. Moreover, the measures that need to be taken to prevent potential attacks, in most cases, are far from straightforward, as they depend on the state of the application, the exact content of the (potentially malicious) data being processed, and the use(s) the application is about to make of that data. The problem is aggravated when it comes to web applications, which by design often feed on untrusted data in the form of user input. Also, web applications often make access to security-sensitive resources, such as databases, file systems or sockets. The problem of securing web applications against malicious attacks therefore has received significant attention.
Cross-Site Scripting (XSS) is a web application vulnerability that allows malicious users to inject code into pages that are viewed by other users. In many classifications, it is recognized as a top web application vulnerability class. The most severe consequences of XSS issues are that attacker is able to make a legitimate user's browser perform operations that change application state on behalf of that user, or that make a user's browser disclose private data.
There are several known methods to protect against an XSS attack. One approach is referred to as input filtering. This approach involves checking web application input for malicious data and rejecting or filtering it as needed. The input filtering method, however, cannot guarantee full protection, and it may be overly aggressive (to the point of being useless) if input data is used by web application in multiple contexts (e.g. HTML and JavaScript). An alternative approach is to use client-side protection, whereby users equip their browsers with extensions that automatically detect attack attempts. The client-side approach, however, does not work properly with some types of XSS attacks, especially persistent XSS where injected code is not passed through input parameters.
Yet another approach, and one which is the best known solution, is referred to output escaping. XSS attacks happen when the application fails to escape its output and an attacker puts HTML and/or JavaScript on the site, which code then runs in the site visitor's web browser. Output escaping stops this happening by making sure that the application never sends commands (HTML) when it only intends to send plaintext. In particular, in this approach, the guarding against XSS attacks is done by escaping characters, i.e., representing characters such that they are treated as data rather than metadata to be consumed by an interpreter's parser. Escape rules for XSS are sensitive to HTML context in which the (often untrusted) input is to be embedded, and these rules typically distinguish among the various components of the page (viz., HTML body, typical attributes, JavaScript event handlers, and links). This approach is designed to ensure that content rendered by the application contains a code (even if the code is input). To be implemented successfully, however, this solution requires significant attention from developers and an active approach from test teams, and it is difficult to implement if the application is a composite created with software from different vendors. Output escaping mechanisms also are difficult to maintain and automate.
An additional problem that output escaping introduces occurs when dynamic content is included by the web application in multiple different contexts (e.g., using HTML, JavaScript, etc.) in a single document. Each inclusion context typically requires different sanitization using a distinct escaping method, and these different escaping methods are often incompatible and cannot be used together. Solving this problem poses additional issues. If sanitization is performed at the moment of including the dynamic content into the resulting document, it may be difficult to identify an outer context (of the included portion) and what escaping should be used for the output. A part of the code that renders the particular dynamic content may not be aware in which content it would be executed. If, however, escaping is not performed during dynamic content inclusion but rather delayed until the application constructs the complete document, it is easy to identify the context of every element, however, at this point the application is not able to distinguish which parts of the document are legitimate and which are XSS injected.
The techniques disclosed herein address these and other deficiencies of the known prior art.