This invention relates to specification and processing of markup.
The stored representations which are read and written by computer-implemented applications are often in the form of “markup.” A particularly example of markup is the Extensible Markup Language (XML), which is in wide use. A useful reference for XML is the book “XML In a Nutshell, 3rd Edition”, by Elliotte Rusty Harold and W. Scott Means, published by O'Reilly, 2004, ISBN 0-596-00764-7, which is incorporated herein by reference.
XML markup consists of hierarchically organized markup elements. A markup element typically consists of a start tag, an optional body, and an end tag. Where the body is absent, the start tag and end tag may be combined into a single tag. The start tag includes a textual name and optional attributes. The name describes the markup element. Each attribute includes a textual key and a textual value. Attributes may provide additional descriptive information about the markup element. The body of the markup element may contain both textual content and nested markup elements. The end tag concludes the markup element.
The hierarchical organization of markup is reflected in the nesting of markup elements. The body of a markup element may contain nested markup elements as well as textual content. The containing markup element is denoted the parent. The nested markup element is denoted the child. A markup element which lacks a parent is denoted a root. In XML, a well-formed document is required to contain exactly one root markup element.
XML is also an example of a metalanguage. A metalanguage is a foundation upon which languages may be built. XML specifies a syntax for markup, but it does not specify how markup should be processed or what interpretation should be attached to the markup elements. XML does not specify a set of valid element names or attributes (with a few minor exceptions). Nor does XML have much to say about relationships between markup elements. The only relationship between markup elements which is explicitly recognized in XML is hierarchy; markup elements may be nested within containing markup elements. Languages that are built upon XML may specify sets of valid names, interpretations for markup elements according to their names, relationships between markup elements, and even processing implications for markup elements, but such considerations are deliberately omitted from XML, which is limited to the syntax of markup.
A metalanguage for markup is of great practical value as it permits standardization of some aspects of processing markup. Applications which process XML may rely on a regular syntax and make use of numerous well-honed tools for application-specific processing of XML. Moreover, markup is readable and writable by humans as well as machines, which reduces the risk that data files will become unusable over time. Direct accessibility by humans also facilitates testing and debugging.
For these and other reasons, XML is very popular, but it does suffers from some shortcomings. The XML markup itself (as distinguished from the textual content) is somewhat redundant as names are duplicated in start and end tags. This redundancy is detrimental to human readers and writers. The implicit brackets that wrap a markup element consist of multiple characters, despite the availability of several distinct bracket character pairs in the ASCII character set. The semantics of attributes are murky; moreover there's no facility for plural values in an attribute. XML requires a single root markup element in a document, which is a reasonable requirement for certain applications but is unnecessarily restrictive in general; for example, this requirement precludes the validity of a document formed by concatentation of two valid documents. These shortcomings are alleviated by the use of a new metalanguage for specificationa and processing of markup.
Thus, it would be advantageous to alleviate some of the shortcomings which XML bears while maintaining a rough structural equivalence to XML, and preserving the benefits of a meta-language for markup. It would also be advantageous to facilitate markup processing by precisely specifying the processing of markup to requests in an object-oriented application programmer's interface (API).