The present invention relates generally to information systems, and more particularly, to techniques for providing and using schema data used for markup languages (e.g., Extensible Markup Language (XML)).
Recently various “markup” languages have been developed. For example, HTML (Hypertext Markup Language) provides a set of markup symbols or codes inserted in a file intended for display on a World Wide Web browser page. The markup tells the Web browser how to display a Web page's words and images for the user. Each individual markup code can be referred to as an element (or a tag). Some elements come in pairs that indicate when some display effect is to begin and when it is to end.
HTML is a formal Recommendation by the World Wide Web Consortium (W3C) and is generally adhered to by more commonly used web browsers (e.g., Microsoft's Internet Explorer or Netscape's Navigator). XML, is also a formal recommendation from the World Wide Web Consortium (W3C). XML is similar to the language of today's Web pages, the Hypertext Markup Language (HTML). Both XML and HTML contain markup symbols to describe the contents of a page or file. HTML, however, can describe the content of a Web page (mainly text and graphic images) only in terms of how it is to be displayed and interacted with. For example, the letter “p” placed within markup tags starts a new paragraph. On the other hand, XML can describe the content in terms of what data is being described. For example, the word “phonenum” placed within markup tags could indicate that the data that followed was a phone number. This means that an XML file can be processed purely as data by a program or it can be stored with similar data on another computer or, like an HTML file, that it can be displayed. For example, depending on how the application in the receiving computer wanted to handle the phone number, it could be stored, displayed, or dialed.
XML is “extensible” because, unlike HTML, the markup symbols are unlimited and self-defining. As such, XML can be a simpler and easier-to-use subset of the Standard Generalized Markup Language (SGML), the standard for how to create a document structure. It is expected that HTML and XML will be used together in many Web applications. XML markup, for example, may appear within an HTML page.
Early applications of XML include Microsoft's Channel Format (CDF), which describes a channel, a portion of a Web site that has been downloaded to a hard disk and is then updated periodically as information changes. A specific CDF file contains data that specifies an initial Web page and how frequently it is updated. Another early application is ChartWare, which uses XML as a way to describe medical charts so that they can be shared by doctors. Applications related to banking, e-commerce ordering, personal preference profiles, purchase orders, litigation documents, part lists, and many others are anticipated.
As appreciated by those skilled in the art, XML (Extensible Markup Language) is a flexible way to create common information formats and to share both the format and the data on the World Wide Web, intranets, and elsewhere. For example, computer makers might agree on a standard or common way to describe the information about a computer product (processor speed, memory size, and so forth) and then describe the product information format with XML. Such a standard way of describing data would enable a user to send an intelligent agent (a program) to each computer maker's Web site, gather data, and then make a valid comparison.
Accordingly, XML can be used by any individual or group of individuals or companies that wants to share information in a consistent way. In other words, an XML file can be generated and exchanged between various entities to share information in a consistent way. In order to make sense of the XML file, however, typically XML Schema Data (or Definitions) pertaining to data references in the XML file are needed.
XSD (XML Schema Definition) is another recommendation of the World Wide Web Consortium (W3C). XSD specifies how to formally describe the elements in an Extensible Markup Language (XML) document. This description can be used to verify that each item of content in a document adheres to the description of the element in which the content is to be placed.
In general, a schema can be an abstract representation of an object's characteristics and relationship to other objects. As such, an XML schema can represent the interrelationship between the attributes and elements of an XML object (for example, a document or a portion of a document). To create a schema for a document, one can analyze its structure, defining each structural element as it is encountered. For example, within a schema for a document describing a Web site, you would define a Web site element, a Web page element, and other elements that describe possible content divisions within any page on that site. XML Schema definition (XSD) is believed to offer several advantages over earlier XML schema languages, such as document type (DTD) or Simple Object XML (SOX). For example, it's more direct: XSD, in contrast to the earlier languages, is written in XML, which means that it doesn't require intermediary processing by a parser. Other benefits include self-documentation, automatic schema creation, and the ability to be queried through XML Transformations (XSLT).
Conventionally, when an XML file is received, schema data (e.g., XSD, DTD files, etc.) need to be accessed in order to make sense of the XML file (e.g., verifying data). The schema data is generally made available by standard organizations. This means that there is extensive use of references to external XML schema data (e.g., XSD, or DTD files). As a result, performance is adversely affected because, among other things, sockets (or similar mechanisms) are needed for down loading schema data files. In addition, conventional approaches do not allow for systematic validation of data because, among other things, the XML schema data is provided in accordance with different specifications by various entities.
Another problem is that the conventional approaches are generally not secure as data is typically downloaded from various Web sites. This poses very serious security risks because the entity that downloads schema data can easily be misinformed by an honest mistake or intentionally be given corrupt data. In any case, lack of security can result in very adverse consequences. Yet another problem with the conventional approach is that privacy is greatly compromised because an entity can be monitored. This information can be saved and analyzed for various reasons and applications. For example, an entity can be monitored for XML schema data that it frequently uses. This can be used to profile the entity for marketing and advertising applications.
Accordingly, techniques for providing and using XML schema data are needed.