The Web is the visual interface to the Internet's vast collection of resources. Today, HTML (HyperText Markup Language) is the predominant language for expressing Web pages. An HTML document comprises the textual content of the document embedded in matched display tags which specify the visual presentation of the content. A well-designed HTML document is visually interesting to a human viewer when displayed in a Web browser. However, the automatic extraction of information from HTML documents is difficult since HTML tags are designed to express presentation rather than semantic information. This makes HTML a less than ideal medium for general electronic interchange in the Internet.
HTML is a specific language of the more powerful SGML (Standard Generalized Markup Language), a sophisticated tag language that separates view from content and data from metadata. Due to SGML's complexity, and the complexity of the tools required, it has not achieved widespread acceptance.
XML, the Extensible Markup Language, is a new format designed to bring structured information to the Web. It is a Web-based language for electronic data interchange. XML is an open technology standard of the World Wide Web Consortium (W3C), which is the standards group responsible for maintaining and advancing HTML and other Web-related standards.
XML is a sub-set of SGML that maintains the important architectural aspects of contextual separation while removing nonessential features. The XML document format embeds the content within tags that express the structure. XML also provides the ability to express rules for the structure (i.e., grammar) of a document. These two features allow automatic separation of data and metadata, and allow generic tools to validate an XML document against its grammar.
Unlike HTML, an XML document does not include presentation information. Instead, an XML document may be rendered for visual presentation by applying layout style information with technologies such as XSL (Extensible Style Language). Web sites and browsers are rapidly adding XML and XSL to their functionality.
The XML approach to structured data interchange has been validated through the wide experience with XML itself and with other members of the XML family: SGML, which is used in high-end document processing, and HTML, the predominant language of the Web.
XML is widely believed to be the next step in the evolution of the Web. This is demonstrated by announcements by Netscape and Microsoft that upcoming versions of the leading Web browsers, Netscape Navigator and Internet Explorer, will incorporate XML support.
While XML is still in its infancy, there are many well-documented applications of XML. Example application domains include Web commerce, publishing, repositories, modeling, databases and data warehouses, services, financial, health care, semiconductors, inventory access, and more.
XML is gaining widespread acceptance as the de facto standard for representing structured information in the context of the Worldwide Web and beyond. The XML language is defined by the Worldwide Web's. “Extensible Markup Language (XML) Recommendation 1.0” document [Rec-xml-19980210]. This definition includes a specification of XML in Extended Backus-Naur Form (EBNF) notation.
Repositories provide a central place for recording metadata and enable one to store, manage, share and reuse information about data (i.e., metadata) that an enterprise uses. A repository can store definitional, management and operational information. Tools can be integrated with the repository to support information sharing and metadata reuse, and tool and technology models may be developed to manipulate the tool information in the repository. However, the transferring of data within models, from tool to tool, or from a tool to the repository has been a cumbersome and unyielding task for a long time. Previous Interchange mechanisms have typically used extensible structured references; and the MOF user programming representation.