Extensible Markup Language (XML) is a markup language that defines a set of rules for encoding documents in a plain-text format that is both human-readable and machine-readable. One version of XML is defined in the XML 1.0 Specification produced by the World Wide Web Consortium (W3C) and dated Nov. 26, 2008, which is incorporated herein by reference in its entirety. The XML 1.0 Specification defines an XML document as a text that is well-formed and valid.
An XML schema is a description of a type of XML document, typically expressed in terms of constraints on the structure and content of documents of that type, above and beyond the basic syntactical constraints imposed by the XML 1.0 Specification itself. These constraints are generally expressed using some combination of grammatical rules governing the order of elements, boolean predicates associated with the content, data types governing the content of elements and attributes, and more specialized rules such as uniqueness and referential integrity constraints. The process of checking to see if an XML document conforms to an XML schema is called validation, which is separate from XML's core concept of syntactic well-formedness. All XML documents are defined as being well-formed, but an XML document is on check for validity where the XML processor is “validating,” in which case the XML document is checked for conformance with its associated schema.
Although the plain-text human-readable aspect of XML documents may be beneficial in many situations, this human-readable aspect may also lead to XML documents that are large in size and therefore incompatible with devices with limited memory or storage capacity. Efforts to reduce the size of XML documents have therefore often eliminated this plain-text human-readable aspect in favor of more compact binary representations.
EXI is a Binary XML format in which XML documents are encoded in a binary data format rather than plain text. In general, using a binary XML format reduces the size and verbosity of XML documents, and may reduce the cost in terms of time and effort involved in parsing XML documents. EXI is formally defined in the EXI Format 1.0 Specification produced by the W3C and dated Mar. 10, 2011, which is incorporated herein by reference in its entirety. An XML document may be encoded in an EXI format as a separate EXI stream.
When no schema information is available or when available schema information describes only portions of an EXI stream, EXI employs built-in element grammars. Built-in element grammars are dynamic and continuously evolve to reflect knowledge learned while processing an EXI stream. New built-in element grammars are created to describe the content of newly encountered elements and new grammar productions are added to refine existing built-in grammars. Newly learned grammars and productions are used to more efficiently represent subsequent elements in the EXI stream.
While useful for reflecting knowledge learned while processing an EXI stream, the dynamic and continuously evolving aspect of built-in element grammars may cause EXI streams to grow in dynamic memory to a point where the space allocated in dynamic memory for built-in element grammars is relatively large in size. This relatively large size may be problematic where an EXI stream is processed by an EXI processor that is employed in a device with limited memory capacity.
The subject matter claimed herein is not limited to embodiments that solve any disadvantages or that operate only in environments such as those described above. Rather, this background is only provided to illustrate one example technology area where some embodiments described herein may be practiced.