Extensible Markup Language (“XML”) is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. XML is a textual data format having support via Unicode for the languages of the world. XML can be used for the representation of arbitrary data structures and documents. Various application programming interfaces (“APIs”) exist to assist software developers with processing of XML data. Further, various schema systems exist to assist the developers in the definition of XML-based languages. Some of the document formats that use XML syntax include RSS, Atom, SOAP, XHTML, and others. XML is also used for communication protocols by way of Extensible Messaging and Presence Protocol (“XMPP”).
Documents to be processed by applications can be stored/exchanged using XML. Applications that use XML documents can choose different XML parsers to retrieve data from the XML document for processing within the application. Each such parser can create a slightly different output. Additionally, definitions can be provided how the parsers should work internally during processing of an XML document, where the definitions can include different attributes, features or calling methods that can trigger various parser settings.
XML documents can be created in various ways, which can make the applications and/or the server(s) on which they are running on vulnerable to attacks through parsers. Such attacks are relatively easy to create and in today's world are being used more and more by attackers. Such attacks often occur by using a Document Type Definition (“DTD”), which may be specified within an XML document. The DTD can be used to declare which elements and references may appear in the document, where and being of which type, allowing also to specify External Entities referencing other uniform resource identifiers (“URIs”). An XML attack can lead to a Denial of Service (“DoS”) by causing a high consumption of resources (e.g., memory, CPU usage) on the server for a long period of time done deliberately and with the purpose to block other services from execution, a disclosure of data by retrieving data which normally would not be accessible, a remote system access by opening connections to remote systems, possibly also from a server, breaking of application logic, and/or any other attacks. These attacks can be often referred to as XML External Entity (“XXE”) attacks, XML bombs and/or XML injection.
An XML parser by default follows the XML specifications, but cannot provide any means to prevent such attacks. Thus, an application that uses an XML parser must configure the parser in such a way that using the XML services does not pose a threat for the system. Protection is especially required if XML from untrusted sources is to be processed. However, a trusted source also cannot guarantee safety because the trusted source itself can also be attacked and/or manipulated. Thus, application developers must configure the parser to use only that functionality which is absolutely necessary and forbid the potentially dangerous one. As such, conventional systems, which do not perform configuration of parsers, might not be able to protect the parsers and/or applications from attacks.