Extensible Markup Language (XML) is a human-readable, machine-understandable, general syntax for describing data such as hierarchical data. XML is an open standard for describing data developed under the auspices of the World Wide Web Consortium (W3C). XML is a subset of the Standard Generalized Markup Language (SGML) defined in ISO standard 8879:1986. It is a formal language that can be used to pass information about the component parts of a document from one computer system to another. XML is in general a method for putting structured data in a text file. XML consists of a set of rules and guidelines for designing text formats for such data in a way that produces files that are easily read by, for example, a data processing system such as a computer. XML can be used to describe components of a document (e.g. form, address books, spread sheets, financial transactions, technical drawings, etc.), and it is based on the concept of documents composed of a series of elements, entities, and attributes. Elements describe the meaning of the text they describe. Entities are normally external objects, such as graphics files, that are intended to be included in a document. Entities can also be internal and can serve various other purposes such representing reserved characters and user defined purposes. XML also provides a formal syntax for describing the relationships between the elements, attributes, and entities that make up an XML document, such a syntax can be used to recognize component parts of each document as explained in more detail below.
To allow a computer to check the structure of an XML document, generally users associate the document with a document type definition (DTD). A DTD is a set of rules that explains how to use an XML document. The DTD declares each of the permitted elements and attributes, defines entities, and specifies the relationships between them. XML gains its extensibility by allowing users to define the elements, attributes, and entities. By defining the role and attributes of each element of text in a formal model, i.e., the Document Type Definition (DTD), users of XML can check the validity of each component of the document. An XML DTD allows computers to check, for example, that users do not accidentally enter a third-level heading without first having entered a second-level heading, something that cannot be checked using the HyperText Markup Language (HTML) used to code documents that form part of the World Wide Web (WWW) of documents accessible through the Internet. However, XML does not restrict users to using DTDs.
To use a set of elements that have, for example, been defined by a trade association or similar body, users need to know how the elements are delimited from normal text and in which order the various elements should be used. Systems that understand XML can provide users with lists of the elements that are valid at each point in the document and will automatically add the required delimiters to the name to delineate the element. Where the data capture system does not understand XML, users can enter the XML elements manually for later validation. Elements and their attributes are entered between matched pairs of angle brackets (< . . . >) while entity references start with an ampersand and end with a semicolon (& . . . ;).
Because XML elements are based on the logical structure of the document they are somewhat easier to understand than physically based markup schemes of the type typically provided by word processors. As an example, a memorandum document coded in XML might look as follows:
<memo><to>All staff</to><from>R. Michael</from><date>April 1, 2001</date><subject>Bring Manuals</subject><text>Please bring your todo list with you to today'smeeting.</text></memo>
As shown in the example above, the start and end of each logical element of the file has been clearly identified by entry of a start-tag (e.g. <to>) and an end-tag (e.g. </to>). This formatting is ideal for a computer to follow, and therefore for data processing.
A DTD associated with the XML example above could take the form:
<!DOCTYPE memo [<!ELEMENT memo(to, from, date, subject?, para+) ><!ELEMENT para(#PCDATA) ><!ELEMENT to(#PCDATA) ><!ELEMENT from(#PCDATA) ><!ELEMENT date(#PCDATA) ><!ELEMENT subject(#PCDATA) >]>
This model indicates that a memorandum consists of a sequence of header elements, <to>, <from>, <date> and, optionally, <subject>, which must be followed by the contents of the memorandum. The content of the memo defined in this example consists of at least one paragraph (this is indicated by the + immediately after para ). In this simplified example a paragraph has been defined as a leaf node of the memo element and can contain parsed character data (#PCDATA), i.e. data that has been checked to ensure that it contains no unrecognized markup strings (i.e. text).
XML-coded files are suitable for communicating data to be stored in databases. Because XML files are both object-oriented and hierarchical in nature they can be adapted to many types of databases. A standardized interface to XML data is defined through W3C's Document Object Model (DOM), which provides a Common Object Request Broker Architecture (CORBA) interface definition language (IDL) interface between applications exchanging XML data.
XML is said to be “well formed” when it complies with well-known XML rules. If a well-formed XML document is associated with and conforms to a DTD, the document is said to be “valid”. XML validation and well formedness can be checked using XML processors which are commonly referred as XML parsers. An XML parser checks whether an XML document is valid by checking that all components are present, and the document instance conforms to the rules defined in the DTD.
Most applications have an engine known as an XML parser that accepts XML documents as input data. These XML documents must be well-formed to be accepted by the XML parser, and, if the documents are associated with a DTD, they must be valid. Additionally, XML uses a number of “reserved” characters such as “<” and “>”. To use these characters as character data, they must be treated in accordance with specific XML rules. Otherwise, the XML parser will reject the XML document or possibly misinterpret it. Other languages have their own requirements for format and syntax. Some languages and parsers are more forgiving than others, but violation of such requirements generally causes an error either via rejection or misinterpretation. Therefore, the more data that a document contains (i.e. the more verbose the document), such as an XML document, the higher the likelihood that the document will contain errors. In the case of an XML document, the more data contained in a document the higher the likelihood that the document will not be well-formed and/or will be invalid. Thus, when an XML document is too verbose, it becomes prone to errors during parsing. To avoid the inclusion and recurrence of errors, in many instances experts are employed to write the XML document and associated DTD. Additionally, to accommodate a verbose XML document, the elements and attributes (i.e. grammar) supported by the parser would have to be large. One of the proposed advantages of XML is its extensibility. However, if extensions of an XML document's grammar were desired, the parser would have to be recompiled to support the extended grammar. Thus, the parser would have to be very complex to accommodate a large variety of elements and attributes. Therefore, there is a need to maintain a fixed or reduced complexity of the parser while allowing extensibility of the grammar available to XML document authors.
Extensible Stylesheet Language (XSL)
Extensible Stylesheet Language (XSL) is a language for creating a style sheet that describes how data sent to a user using the Extensible Markup Language is to be presented. XSL is based on, and extends the Document Style Semantics and Specification Language (DSSSL) and the Cascading Style Sheet, level 1 (CSS1) standards. XSL provides the tools to describe exactly which data fields in an XML file to display and exactly where and how to display them. XSL consists of two parts: a language for transforming XML documents and an XML vocabulary for specifying formatting semantics. For example, in an XML page that describes the characteristics of one or more products from a retailer, a set of open and close element s that designate products manufacturers might contain the name of the product manufacturer. Using XSL, it is possible to dictate to a web browser application on a computer the placement of the manufacturer's name on a page and the display style of the manufacturer's name.
Like any style sheet language, XSL can be used to create a style definition for one XML document or reused for many other XML documents.
Extensible Stylesheet Language Transformation (XSLT)
Extensible Stylesheet Language Transformation (XSLT) is a language for transforming XML documents into other XML documents. The specification of the syntax and semantics of XSLT is developed under the auspices of W3C.
XSLT is designed for use as part of XSL. XSL describes the styling of an XML document that uses formatting vocabulary of an application and uses XSLT to describe how the document is transformed into another XML document that uses the formatting vocabulary. However, XSLT is also designed to be used independently of XSL.