XML (eXtensible Markup Language) is a language which enables a structured description of the contents of a document by means of XML-schema language definitions. A more precise description of the XML schema and of the structures, data types and content models used therein is found in references [1], [2] and [3], and are incorporated by reference in their entirety therein.
An XML Schema consists of components such as type definitions and element declarations. These can be used to assess the validity of well-formed element and attribute information items and furthermore may specify augmentations to those items and their descendants.
Schema-validity assessment has two aspects: (1) determining local schema-validity, that is whether an element or attribute information item satisfies the constraints embodied in the relevant components of an XML Schema and (2) synthesizing an overall validation outcome for the item, combining local schema-validity with the results of schema-validity assessments of its descendants, if any, and adding appropriate augmentations to an XML infoset to record this outcome. Reference [2] uses the word “valid” and its derivatives are used to refer to (1) above, the determination of local schema-validity and uses the word “assessment” is used to refer to the overall process of local validation, schema-validity assessment and infoset augmentation.
Three levels of conformance have been described for schema aware processors. The first is required of all processors. Support for the other two will depend on the application environments for which the processor is intended. The first level of “minimally conforming” processors must completely and correctly implement: Schema Component Constraints, which are constraints that describes each component of a schema, Validation Rules, which are rules that determine the validity of each component of a schema, and Schema Information Set Contribution, which are additional rules on each component of the schema that are a consequence of the validation and/or assessment of a given schema component. The second level, which are processors that provide “conformance to the XML Representation of Schemas”, are processors that accept schemas represented in the form of XML documents as described in [2].
By separating the conformance requirements relating to the concrete syntax of XML schema documents, processors which use schemas stored in optimized binary representations, dynamically created schemas represented as programming language data structures, or implementations in which particular schemas are compiled into executable code such as C or Java are admitted as conforming. Such processors may be minimally conforming but not necessarily in conformance to the XML Representation of Schemas.
The third level of “fully conforming” processors are network-enabled processors which are not only both minimally conforming and in conformance to the XML Representation of Schemas, but which additionally must be capable of accessing schema documents from the World Wide Web as described in document [2].
Document [3] further describes two of the three levels of conformance with respect to data types. Minimally conforming processors must completely and correctly implement the Constraint on Schemas, which are constraints that describes each component of a schema, and the Validation Rule, which provides constraints expressed by schema components which information items must satisfy to be schema-valid. Processors that provide conformance to the XML Representation of Schemas must completely and correctly implement all Schema Representation Constraints, which are constraints on the representation of schema components in XML as described in [3], and must adhere exactly to the specifications in XML Representation of Simple Type Definition Schema Components, which describes the rules for a Simple Type element information item.
A 3-layer architecture may be detailed as implied by the three conformance levels. The layers are: (1) the “assessment core”, relating schema components and instance information items, (2) schema representation, relating the connections between XML representations and schema components, including the relationships between namespaces and schema components, and (3) XML Schema web-interoperability guidelines, relating instance-to-schema and schema-to-schema connections for the World Wide Web.
Layer 1 specifies the manner in which a schema composed of schema components can be applied to in the assessment of an instance element information item. Layer 2 specifies the use of schema elements in XML documents as the standard XML representation for schema information in a broad range of computer systems and execution environments. To support interoperation over the World Wide Web in particular, layer 3 provides a set of conventions for schema reference on the Web.
Reference [3] defines “datatypes” to be used in XML Schemas. A datatype is a 3-tuple, consisting of a) a set of distinct values, called its value space, b) a set of lexical representations, called its lexical space, and c) a set of facets that characterize properties of the value space, individual values or lexical items. A value space is the set of values for a given datatype. Each value in the value space of a datatype is denoted by one or more literals in its lexical space. A lexical space is the set of valid literals for a datatype. For example, “100” and “1.0E2” are two different literals from the lexical space of the “float” datatype which both denote the same value. A facet is a single defining aspect of a value space. Generally speaking, each facet characterizes a value space along independent axes or dimensions.
Datatypes may be distinguished as “atomic” datatypes, which are datatypes having values regarded as being indivisible, “list” datatypes, which are datatypes having values each of which consists of a finite-length (possibly empty) sequence of values of an atomic datatype, and “union” datatypes, which are datatypes whose value spaces and lexical spaces are the union of the value spaces and lexical spaces of one or more other datatypes. Datatypes may also be distinguished as “primitive” or “derived”. Primitive datatypes are those that are not defined in terms of other datatypes and derived datatypes are those that are defined in terms of other datatypes. For example, with respect to Reference [3], a “float” datatype is a well-defined mathematical concept that cannot be defined in terms of other datatypes, while an “integer” datatype is a special case of the more general datatype “decimal”.
Methods, devices or systems for coding and/or decoding XML-based documents are known from publications relating to the MPEG-7 standard, in particular from document [4].
Known methods for the binary representation of MPEG-7 and other XML-based descriptions or documents have shortcomings in terms of compatibility, insofar as the schemas of the XML-based descriptions or documents to be coded are not fully known to the encoder and/or decoder at the start of transmission. In document [4], for example, a method for the binary representation of XML descriptions and XML documents is described which specifies code tables for XML descriptions and XML documents based on schemas and namespaces. Here, a namespace is a space in the document structure in which the names used therein are assigned unique meanings or declarations, it being possible for the names occurring in one namespace to appear in other namespaces with different meanings or declarations. By contrast, at least one part of the namespace is defined by means of a schema. In the method described in [4], the code tables for the data types, the global elements and the SubstitutionGroups are dependent on all the namespaces used. The schemas and namespaces must accordingly be known before the code tables are generated.