This invention relates to methods and systems for generating Extensible Markup Language (XML) documents. More particularly, the invention concerns generating XML documents without building and saving in memory a hierarchical tree structure that represents the XML document.
Extensible Markup Language (XML) is a meta-markup language that provides a format for describing structured data. XML is similar to HTML in that it is a tag-based language. By virtue of its tag-based nature, XML defines a strict tree structure or hierarchy. XML is a derivative of Standard Generalized Markup Language (SGML) that provides a uniform method for describing and exchanging structured data in an open, text-based format. XML utilizes the concepts of elements and namespaces. Compared to HTML, which is a display-oriented markup language, XML is a general purpose language for representing structured data without including information that describes how to format the data for display.
XML xe2x80x9celementsxe2x80x9d are structural constructs that consist of a start tag, an end or close tag, and the information or content that is contained between the tags. A xe2x80x9cstart tagxe2x80x9d is formatted as xe2x80x9c less than tagname greater than xe2x80x9d and an xe2x80x9cend tagxe2x80x9d is formatted as xe2x80x9c less than /tagname greater than xe2x80x9d. In an XML document, start and end tags can be nested within other start and end tags. All elements that occur within a particular element must have their start and end tags occur before the end tag of that particular element. This defines a strict tree-like structure. Each element forms a node in this tree, and potentially has xe2x80x9cchildxe2x80x9d or xe2x80x9cbranchxe2x80x9d nodes. The child nodes represent any XML elements that occur between the start and end tags of the xe2x80x9cparentxe2x80x9d node.
XML accommodates an infinite number of database schemas. Within each schema, a xe2x80x9cdictionaryxe2x80x9d of element names is defined. The dictionary of element names defined by a schema is referred to as a xe2x80x9cnamespace.xe2x80x9d Within an XML document, element names are qualified by namespace identifiers. When qualified by a namespace identifier, a tag name appears in the form xe2x80x9c[namespace]:[tagname]xe2x80x9d. This model enables the same element name to appear in multiple schemas, or namespaces, and for instances of these duplicate element names to appear in the same XML document without colliding. Start tags can declare an arbitrary number of xe2x80x9cattributesxe2x80x9d which declare xe2x80x9cproperty valuesxe2x80x9d associated with the element being declared. Attributes are declared within the start tag using the form xe2x80x9c less than [tagname] [attribute1],[attribute2]. . . , [attributeN] greater than xe2x80x9d, where an attribute1 through attributeN are declarations of an arbitrary number of tag attributes. Each attribute declaration is of the form xe2x80x9c[attributeName]=[attributeValue]xe2x80x9d where each attribute is identified by a unique name followed by an xe2x80x9c=xe2x80x9d character, followed by the value of the attribute.
Within an XML document, namespace declarations occur as attributes of start tags. Namespace declarations are of the form xe2x80x9cxmlns:[prefix]=[uri]xe2x80x9d. A namespace declaration indicates that the XML document contains element names that are defined within a specified namespace or schema. Prefix is an arbitrary designation that will be used later in the XML document as an indication that an element name is a member of the namespace declared by uri. The prefix is valid only within the context of the specific XML document. xe2x80x9cUrixe2x80x9d or universal resource indicator is either a path to a document describing a specific namespace or schema or a globally unique identifier of a specific namespace or schema. Uri is valid across all XML documents. Namespace declarations are xe2x80x9cinheritedxe2x80x9d, which means that a namespace declaration applies to the element in which it was declared as well as to all elements contained within that element.
Namespace inheritance within an XML document allows non-qualified names to use xe2x80x9cdefaultxe2x80x9d namespaces. Default namespaces are explicitly declared as attributes of start tags. Default namespace declarations are of the form xe2x80x9cxmlns=[uri]xe2x80x9d. Note that the declaration of a default namespace is equivalent to the declaration of a non-default namespace but the prefix is omitted. A namespace specification within an XML document is said to have a xe2x80x9cscopexe2x80x9d which includes all child nodes beneath the namespace specification.
One exemplary usage of XML is the exchange of data between different entities, such as client and server computers, in the form of requests and responses. A client might generate a request for information or a request for a certain server action, and a server might generate a response to the client that contains the information or confirms whether the certain action has been performed. The contents of these requests and responses are xe2x80x9cXML documentsxe2x80x9d, which are sequences of characters that comply with the specification of XML. In many cases, the process of generating these XML documents involves the building, in memory, of a hierarchical tree structure. Once the hierarchical tree structure is built, in its entirety, the actual XML document in proper syntactic form can then be assembled. Consider the following exemplary XML code:
This code includes three XML namespace declarations that are each designated with xe2x80x9cxmlnsxe2x80x9d. The declarations include a prefix, e.g. xe2x80x9cpersonxe2x80x9d, xe2x80x9cdsigxe2x80x9d, and xe2x80x9ctransxe2x80x9d respectively, and the expanded namespace to which each prefix refers, e.g. xe2x80x9chttp://www.schemas.org/peoplexe2x80x9d, xe2x80x9chttp://dsig.orgxe2x80x9d, and xe2x80x9chttp://www.schemas.org/transactionsxe2x80x9d respectively. This code tells any reader that if an element name begins with xe2x80x9cdsig:xe2x80x9d its meaning is defined by whoever owns the xe2x80x9chttp://www.dsig.orgxe2x80x9d namespace. Similarly, elements beginning with the: xe2x80x9cperson:xe2x80x9d prefix have meanings defined by the xe2x80x9chttp://www.schemas.org/peoplexe2x80x9d namespace and elements beginning with the xe2x80x9ctransxe2x80x9d prefix have meanings defined by the xe2x80x9chttp://www.schemas.org/transactionsxe2x80x9d namespace. It is important to note that another XML document that incorporated elements from any of the namespaces included in this sample might declare prefixes that are different from those used in this example. As noted earlier, prefixes are arbitrarily defined by the document author and have meaning only within the context of the specific element of the specific document in which they are declared.
Namespaces ensure that element names do not conflict, and clarify who defined which term. They do not give instructions on how to process the elements. Readers still need to know what the elements mean and decide how to process them. Namespaces simply keep the names straight.
FIG. 1 shows how the structure of the above code can be represented in a hierarchical tree structure. In FIG. 1, all of the elements or nodes are set out in an exemplary tree that represents the XML document. Such a structure is typically constructed in memory, with each node containing all data necessary for the start and end tags of that node.
It has been typical in the past to build the entire tree structure, such as the one shown in FIG. 1, before generating the XML document itself. For large XML documents, this can consume a great deal of memory and processor time. Thus, it would be desirable to avoid this process if at all possible.
Accordingly, this invention arose out of concerns associated with providing improved methods and systems for generating XML documents that do not require or need a hierarchical tree structure to be built and stored in memory in order for the actual body of the XML document to be generated. This invention also addresses the algorithms and data representations involved in managing and coordinating the generation of namespace declarations and prefix allocations involved in generating an XML document.
Methods and systems are described for generating an XML document that do not require a hierarchical tree structure to be built and stored in memory in order for the document to be built. These include methods and systems for managing and coordinating the generation of namespace declarations and prefix allocations involved in generating an XML document. Aspects of the invention are particularly suited for use in the context of client/server architectures. Applicability of the inventive aspects, however, can extend outside of this client/server architecture.
In the described embodiment, a xe2x80x9crequest objectxe2x80x9d is provided and is used to receive information from a client that desires to generate an XML request and to organize the information into an XML request. Information is first accumulated by the request object and is then transformed into an appropriate XML document. The information that is accumulated by the request object includes the namespaces that are to be incorporated into the request. All of the namespaces are collected and organized into a data structure. Prefixes are assigned to and stored with each namespace value that is placed in the data structure. Some of the namespace values are reserved and have predefined or reserved prefixes that serve to support specific legacy servers. These specific legacy servers have non-compliant XML parsers that require specific, non-arbitrary, namespace prefixes to be used to identify specific namespaces or schemas.
In one embodiment of this invention, a client computer generates and sends a request to server computer requesting information about objects that exist on the server. Specifically, the client requests values of properties such as author, last modification date, or subject, associated with documents on the server. The body of the request sent by the client to the server is an XML document that specifies the properties the client wishes to retrieve. The properties may be elements in one or more namespaces. In this case the request object is specialized to generate a specific type of XML document that is a request for property values.
In the described embodiment, a xe2x80x9cnamespace arbiterxe2x80x9d is utilized by the request object to manage and oversee maintenance of the data structures that contain the namespace values and their prefixes. When a client wishes to generate a request for property values it provides the names of all the namespaces (also referred to as xe2x80x9cnamespace valuesxe2x80x9d) to the namespace object. The process of providing the namespaces involves the client invoking a method in the request object once per namespace to be added. The result of each method invocation is a moniker, returned from the request object to the client, which uniquely identifies the namespace. The moniker represents the namespace value and is unique for each namespace that is to appear in the request. The moniker is then used by the client for additional calls to the request object. Once the client has added all of the namespaces to the request object, and received a moniker for each namespace, the client will invoke methods in the request object to add the properties, such as author, etc., to the request. For each property requested, the client will provide the moniker identifying the namespace in which the element exists as well as the name of the property requested.
The specified properties are maintained in a data structure that organizes the properties and the prefixes that are associated with the namespace to which the property pertains. In the described embodiment, data structures can be defined for adding new properties or for modifying property values of existing properties.
Thus, a collection of namespaces, associated prefixes, and associated properties is defined prior to building the XML document. The data structures are flat structures. Once all of the information has been collected by the request object, it can be rendered into an XML document by the request object and sent to an appropriate server for processing.