1. Field of the Invention
The present invention relates generally to serialization of XML data, and in particular to XML canonicalization.
2. Description of the Related Art
Databases allow data to be stored and accessed quickly and conveniently. Various query languages may be used to access the data. An example of a typical query language is the Structured Query Language (SQL) which conforms to a SQL standard as published by the American National Standards Institute (ANSI) or the International Standards Organization (ISO). An extensible mark-up language (XML) can represent data in a serial text format. XML text can be conveniently exchanged with applications over the Internet. XML query languages, such as SQL/XML and XQuery, can be used to retrieve data from a database and represent that data in XML format.
XML query languages contain query features, called constructors, which are used to construct XML data based on input data. The constructed XML data may be stored in an internal format, such as a tree that conforms to the XQuery data model. Eventually, in many cases, the constructed XML data goes through a serialization process to generate equivalent XML text (or binary stream), also referred to as serialized data or text, for applications to consume.
An XML document may contain many element names and attribute names, and names that represent semantic information. In W3C recommendation, “Namespaces in XML”, REC-xml-names-19990114, Jan. 14, 1999, to avoid name conflict, XML provides a mechanism, referred to as XML namespaces. An XML namespace provides a unique name so that semantics of names associated with the namespace are well-defined. The XML namespace is a fundamental feature of XML and in the constructed XML data. An XML namespace has a namespace name (a uniform resource identifier, i.e. URI), which is bound to a namespace prefix, and is sometimes represented as (prefix, URI). The URI is used to identify and locate resources on the Internet, and the prefix is used as a proxy for the URI.
In the serialized data, a namespace declaration, signified by the “xmlns” attribute name or prefix, is used to declare a namespace. Due to syntactic and semantic requirements of query languages, the literally serialized XML text from constructed data in an internal XML format often contains redundant XML namespace declarations. If the data returned in response to a SQL/XML query is literally serialized into text format, the namespace declarations can sometimes take up the major portion of the XML text. To reduce the amount of data and application processing expense, it is desirable to reduce the number of redundant or superfluous namespace declarations in the serialized XML text. Eliminating superfluous namespace declarations is also a part of XML canonicalization, W3C recommendation, “Canonical XML Version 1.0”, 15 Mar. 2001.
In addition, when a portion of the XML data, such as an XML fragment or sub-tree, is constructed without a default namespace, but later is connected to a containing fragment or tree with a default namespace, the fragment without the default namespace has to “undeclare” the default namespace in the containing fragment. If the default namespace is not undeclared, the XML fragment or sub-tree will inherit the default namespace, which is not correct, and will cause errors.
Therefore there is a need for a technique to eliminate redundant or superfluous namespace declarations. There is also a need for a technique to undeclare inherited default namespaces for fragments or sub-trees which are constructed without a default namespace.