XML is a descriptive markup language, and, like HTML, is an application of SGML (Standard Generic Markup Language, ISO-8879 international standard). The application scope of XML has gone beyond the scope reached by HTML, owing to the outstanding advantages of XML in terms of extensibility, portability, and well-formedness, etc.
An XML document consists of tags and content. There are six kinds of tags in XML: elements, attributes, entity references, comments, processing instructions, and CDATA sections. The most remarkable difference between XML and HTML is that document Type Declarations (DTDs) have been introduced into XML documents. DTDs enable a document to exchange Meta information related with its content with a parser. The emergence of DTDs endows extensibility, well-formedness, and verifiability to an XML document, so that XML obtains some properties similar to databases, and information can be organized and managed in XML; in addition, XML documents can be presented conveniently in web browsers in a way similar to HTML webpages, and can be transmitted and exchanged efficiently over Internet.
At present, XML documents can be processed mainly in two ways: SAX and DOM. SAX (Simple API for XML) is a stream-based event handling interface. SAX 2.0 was released in May 2000, in which many functions were enhanced, including support for name space. DOM (Document Object Model) is to build up a complete tree structure in the memory after an XML document is analyzed, and then carry out various operations on that basis. In simple comparison, SAX has a lower demand for system resources and is quicker than DOM, but it manipulates documents in a Read Only mode; DOM has a stronger processing capability, but has a higher demand for system resources, especially when processing large documents. Later, Xpath and Xpointer emerged, mainly for XML search and conversion; XSL, XSLT, and SOAP were developed, mainly for XMLremote object access; with the emergence of XMLQuery Languages, XML query languages can be used for any XML document.
Following the development of networks and Internet, data portability has become an important requirement for new application systems. A benefit of XML is data portability; in addition, XML has the following advantages from a data application aspect: (1) XML files are plain text files, which are not limited by operating systems and software platforms; (2) XML has a Schema-based self-descriptive semantics function, with which the data semantics can be described easily, and such description can be interpreted and automatically processed by computers; (3) XML not only can describe structured data, but also can describe semi-structured or even unstructured data effectively.
An XML file is a collection of data, and it is self-descriptive and portable, and can describe data in tree structures or graphic structures. XML provides many tools available in databases: storage (XML document), schema (DTD, XML schema, REIAXNG, etc.), query language (XQuery, XPath, XQL, XML-QL, QUILT, etc.), and programming interface (SAX, DOM, JDOM), etc. However, XML can not substitute for database technology completely. XML lacks features that must be available in practical databases: efficient storage, indexing, and data modification mechanism; rigorous data security control; complete transaction and data consistency control; multi-user access mechanism; trigger and sophisticated concurrency control, etc. Therefore, XML has poor data reading performance, and this disadvantage will become more apparent when the same XML document is to be read several times. Though XML documents can be used as a database in the environments where the data volume is low, the number of users is small, and the requirement for performance is not high, XML documents are not suitable for working environments where the number of users is large, the data integration level is high, and the requirement for performance is high.