The Extensible Markup Language (“XML”) is a markup language used to describe data. The hallmark of XML is that its tags are not predefined. XML has become a common tool for data manipulation and data transmission. The World Wide Web Consortium (“W3C”) maintains authoritative XML specifications.
XML may be used as a data exchange format. For example, data of any format may be stored in a database. The database may exchange data with one or more client applications. Data from the database may be converted to XML, sent to a client application, and finally converted into some third format for use by the client. Conversely, updating the database may involve XML as an intermediate format as well.
With the rise in data exchange over networks, applications increasingly integrate features that automatically retrieve any necessary data from a network, rather than requiring a user to manually launch a browser, find the information on the network, and then use the information as he sees fit with his local applications. Automated retrieval of data from a network by applications is in some embodiments referred to as web service technology.
Applications running on a single device typically communicate using Remote Procedure Calls (RPC) between objects, such as DCOM and CORBA. RPC represents a compatibility and security problem, however, when communicating between multiple devices on a network, especially a Hyper Text Transfer Protocol (“HTTP”)-based network such as the internet. Firewalls and proxy servers will block RPC traffic.
Alternative inter-application data exchange protocols may take advantage of XML as a data exchange format. One such data exchange protocol is the Simple Object Access Protocol (“SOAP”). SOAP provides a format for sending messages, and was optimized for communication via the internet. It has the advantages of being platform independent, language independent, simple and extensible, and firewall compatible. SOAP is based on XML, and like XML, authoritative SOAP specifications are maintained by the W3C.
XML and SOAP technologies thus allow modem applications to incorporate features that automatically send SOAP requests for data, then parse, normalize, and use XML data returned in SOAP responses. Conversely, applications such as databases can receive SOAP requests, formulate SOAP responses containing XML data, then serialize and transmit the SOAP response to the requesting client application.
XML was designed to be human readable. As a result, some features of XML are directed to enhance readability, rather than to optimize data integrity in data exchange. For example, to aid in readability, the XML specification describes the concept of “ignorable white space” and defines mandatory white space normalization rules for standards-based XML parsers so that an XML document can be formatted in an easily readable way without changing the meaning of the XML. Unfortunately when XML is being used simply as a way of transporting data to or from data sources where the white space is important to the meaning of the data, XML's treatment of white space can modify the white spaces in the data and thus corrupt the data.
With the rise in the exchange of XML data, there is a need in the industry to prevent corruption of XML data as it is passed between different computing environments.