Extensible markup language (XML) is a markup language that defines a set of rules for encoding documents in a plain-text format that is both human-readable and machine-readable. One version of XML is defined in the XML 1.0 Specification produced by the World Wide Web Consortium (W3C) and dated Nov. 26, 2008, which is incorporated herein by reference in its entirety. The XML 1.0 Specification defines an XML document as a text that is well-formed and valid.
An XML schema is a description of a type of XML document, typically expressed in terms of constraints on the structure and content of documents of that type, above and beyond the basic syntactical constraints imposed by the XML 1.0 Specification itself. These constraints are generally expressed using some combination of grammatical rules governing the order of elements, boolean predicates associated with the content, data types governing the content of elements and attributes, and more specialized rules such as uniqueness and referential integrity constraints. The process of checking to see if an XML document conforms to an XML schema is called validation, which is separate from XML's core concept of syntactic well-formedness. All XML documents are defined as being well-formed, but an XML document is on check for validity where the XML processor is “validating,” in which case the XML document is checked for conformance with its associated schema.
Although the plain-text human-readable aspect of XML documents may be beneficial in many situations, this human-readable aspect may also lead to XML documents that are large in size and therefore incompatible with devices with limited memory or storage capacity. Efforts to reduce the size of XML documents have therefore often eliminated this plain-text human-readable aspect in favor of more compact binary representations.
Efficient XML interchange (EXI) is a binary XML format in which XML documents are encoded in a binary data format rather than plain text. In general, using an EXI format reduces the size and verbosity of XML documents, and may reduce the cost in terms of time and effort involved in parsing XML documents. EXI is formally defined in the EXI Format 1.0 Specification produced by the W3C and dated Mar. 10, 2011, which is incorporated herein by reference in its entirety. An XML document may be encoded in an EXI format as a separate EXI stream. Additionally, the XML document maybe later decoded from the EXI format to an XML document, which may be similar to the original XML document.
An XML document or set of XML documents may include an associated XML schema definition (XSD). The XSD may generally describe the structure of an XML document. When the associated XSD is available for an XML document, the XSD may speed parsing of the XML document. For example, the XSD may provide a “blueprint” according to which the XML document may be parsed.
Prior to being used in encoding XML documents into the EXI format or decoding an EXI stream or an EXI document into XML, the XSD may be normalized into one or more grammars. The grammars are rules that may be used to predict specific sequences of the XML document. An algorithm to generate the grammars for an XSD is included in the EXI Format 1.0 Specification. As used herein the algorithm included in the EXI Format 1.0 Specification is referred to herein as the standard grammar algorithm.