Markup Languages have attained wide popularity in recent years. One type of markup language, Extensible Markup Language (XML), is a universal language that provides a way to identify, exchange, and process various kinds of data. For example, XML is used to create documents that can be utilized by a variety of application programs. Elements of an XML file typically have an associated namespace and schema.
A namespace is a unique identifier for a collection of names that are used in XML documents to define element/attribute names and types. The name of a namespace is commonly used to uniquely identify each class of XML document. The unique namespaces differentiate markup elements that come from different sources and happen to have the same name.
XML Schemata provide a way to describe and validate data in an XML environment. A schema states what elements and attributes are used to describe content in an XML document, where each element is allowed, what types of content is allowed within it and which elements can appear within which other elements. The use of schemata ensures that the document is structured in a consistent and predictable manner. Schemata may be created by a user and generally supported by an associated markup language, such as XML. By using an XML editor, the user can manipulate the XML file and generate XML documents that adhere to the schema the user has created. In previous word processor applications, support for custom XML schemas was added to the application, enabling users to ‘tag’ contents of a document with custom XML markup (e.g. <title>), essentially giving semantic meaning to what was previously an unclassified run of text. This meant that a document, which was previously just text with formatting, but no meaning for other applications to process, could now be a structured XML document containing specific pieces of XML markup from any user-defined XML schema that any other XML-aware application could locate and understand.
In a basic example, the text at the top of a document could be ‘tagged’ as a title with a <title> XML element from a user-defined XML schema, which means that other XML-aware applications can now easily understand that this range of text contains a “title” and extract it appropriately. This enables a backend process to intelligently extract parts of the document with appropriate semantics and context (e.g. this text is the <title>).
However, the drawbacks associated with prior word processor applications stem from the fact that the addition and persistence of custom XML markup is tied to the presentation of the document. That is, in the existing implementations there is an inexorable link between the XML markup of a word processor document (for example, the details of a customer invoice expressed in XML format) and its presentation on the document surface (three paragraphs of plain text followed by a table with 5 columns and 4 rows w/ a specific table style, for example). Therefore, the XML data represented in prior word processor applications (because it is tied to the presentation) must coincide exactly with the content of the document. For example, if the XML schema for the invoice states that <date> comes before <address>, which comes before <phoneNumber>, then those three XML elements must appear in exactly that order as presented in the document. This means that changes to the presentation format (e.g. moving a table row around which contains <date>) will also cause changes to the structure of the XML data contained in that document, which requires extra steps on the part of the solution developer to ensure this data conforms to the structure of the associated XML schema. Thus, the end user of the document is not afforded the freedom to manipulate the presentation freely, because doing so might invariably change the semantics of the data, potentially violating the XML schema for that data.
Additionally, solutions developed on top of prior word processor applications need to more carefully consider the implications of the presentation when attempting to read/write data from a document for a backend application. So, if a paragraph of bold text is tagged as a title, the resulting XML saved by prior word processor applications would look like:
<w:p><Title><w:r><w:rPr><w:b/><w:rPr><w:t>This is the title.</w:t></w:r></Title><w:p>
As shown above, the custom XML tagging is surrounded on both sides by XML tags that are very specific to the prior word processor application—in this example, w:p, w:r, etc. This means that a XML-aware solution which is processing this data must not only understand its own data format (which contains the <Title> element), but must also understand the exact details of the prior word processor application formatting, so it knows to traverse and ignore that information as it is searching for its own data. Accordingly, this kind of implementation still imposes some requirements on the user, because small changes in the look of the text in the document (for example, dragging the contents of the <Title> element into a table cell, etc.) can result in significant changes the location of the custom XML tags within the surrounding word processor's native tags. Thus, a programmer/code developer often needs to write additional code to anticipate and understand where the prior word processor applications is going to put the custom XML elements based on the presentation and deal with all of the various permutations. This means the resulting solution may still need to contain significant logic code for dealing with specific prior word processor application needs.
Programmers/code developers working with the prior word processor applications also need to take into consideration the implications of a document's layout format when considering reading and writing operations. For example, a user might attempt to grab the value of a <StockSymbol> element and use it to place the full name of a company in the <CompanyName> element in the same document as a simple enhancement for a user writing a company report. To maintain the document's integrity, the user needed to consider, both on the reading and writing of the desired data from the document, the current layout format of the document before they could write functional code to perform these actions. For example, the user might need to know if the value they were writing was in a table cell, a bulleted list, etc., in order to construct the prior word processor application's formatting information that, when inserted into the document, would produce the desired result. This is another potential reason for additional coding in order to understand the word processor application's presentation semantics.
Yet another limitation of prior word processor applications is that XML elements' editing behaviors can sometimes be perceived as “fragile.” This is partly because, as discussed above, they are limited by the fact that the positioning of the tags on the document surface determines the structure of the XML data based on the user defined schema. Accordingly, a number of issues may arise. First, typical user operations (e.g. copy/paste from one section to another) may alter the XML structure and render the document invalid according to the associated XML schema. Second, in such word processor implementations, all elements required by the customer-defined XML schema need to be included in some form on the document surface. This means that developers may have a hard time creating associated XML data as a method for carrying around additional information about the document which is not displayed on the document surface, but serves more as metadata. And, third, elements which are semantically unnecessary on the document surface (e.g. non leaf elements which are not marking up mixed content) need to be included as well in such word processor implementations, further increasing the ability of common user operations to modify the XML data.
In many cases, the schema which defines the XML data (for example, the data that comprises a memo document) tends to be rigidly defined by a single standards body in order to facilitate the communication of this data between multiple heterogeneous processing systems. However, in so facilitating the backend communication, often the human readability and editability of the document data is sacrificed, which making it difficult for a user to understand and parse this data. For example, the XML standard might define a standard format for dates, such as: dd-mm-yyyyThh:mm:ss.ssss. All dates are required to be represented in this format to be parsed by XML-aware applications. Obviously, this format is hard for the user to enter correctly, and often clashes with the way in which the user typically enters dates (e.g. many locales typically use mm-dd-yyyy instead of dd-mm-yyyy, etc.).
Thus, what is needed is a way to enable developers to separate the XML data and the presentation of such data in an application, such as a word processor application.