Extensible Markup Language (XML) is a syntax for defining computer languages. XML makes it possible to create languages that are adapted for different uses but which may be processed by the same tools.
An XML document is composed of elements, each element starting with an start-tag comprising the name of the element, for example, <tag>, and ending with a end-tag which also comprises the name of the element, for example, </tag>. Each element can contain other elements or character data.
An element may be specified by attributes, each attribute being defined by a name and having a value. The attributes are placed in the opening tag of the element they specify, for example <tag attribute=“value”>.
XML syntax also makes it possible to define comments, for example <!--Comment-->, and processing instructions which may specify to a computer application what processing operations to apply to the XML document, for example <?myprocessing?>.
The elements, attributes, text data, comments and processing instructions are grouped together under the generic name of item.
Several different XML languages may contain elements with the same name. To use several different languages, an addition has been made to XML syntax making it possible to define namespaces. Two elements are identical only if they have the same name and are situated in the same namespace. A namespace is defined by a Uniform Resource Identifier (URI), for example http://canon.crf.fr/xml/mylanguage. The use of a namespace in an XML document is via the definition of a prefix which is a shortcut to the URI of that namespace. The prefix is defined using a specific attribute. By way of illustration, the expression xmlns:ml=“http://canon.crf.fr/xml/mylanguage” associates the prefix ml with the URI http://canon.crf.fr/xml/mylanguage. The namespace of an element or of an attribute is specified by having its name preceded by the prefix associated with the namespace followed by ‘:’, for example, <ml:tag ml:attribute=“value”>.
XML has numerous advantages and has become a standard for storing data in a file or for exchanging data. XML makes it possible in particular to have numerous tools for processing the files generated. Furthermore, an XML document may be manually edited with a simple text editor. Moreover, an XML document, containing its structure integrated with the data, is easily readable even without knowing the specification.
However, the main drawback of the XML syntax is to be verbose. Thus, the size of an XML document may be several times greater than the inherent size of the data. This large size of XML documents thus leads to a long processing time when XML documents are generated and especially when they are read.
To mitigate these drawbacks, mechanisms for coding XML documents have been sought. The object of these mechanisms is to code the content of the XML document in a more efficient form but enabling the XML document to be easily reconstructed. However, most of these mechanisms do not maintain all the advantages of the XML format. Numerous new formats, enabling the data contained in an XML document to be stored, have thus been proposed. These different formats are grouped together under the appellation “Binary XML”.
Among these mechanisms, the simplest consists of coding the structural data in a binary format instead of using a text format. Furthermore, the redundancy in the structural information in the XML format may be eliminated or at least reduced. Thus, for example, it is not necessarily useful to specify the name of the element in the start-tag and the end-tag. This type of mechanism is used by all the Binary XML formats.
Another mechanism consists of creating one or more index tables which are used, in particular, to replace the names of elements and attributes that are generally repeated in an XML document. Thus, at the first occurrence of an element name, it is coded normally in the file and an index is associated with it. Then, for the following occurrences of that element name, the index will be used instead of the complete string, reducing the size of the document generated, but also facilitating the reading. More particularly, there is no need to read the entire string in the file and, furthermore, determining the element read may be performed by a simple comparison of integers and not by a comparison of strings. This type of mechanism is implemented in several formats, in particular in the formats in accordance with the Fast Infoset and Efficient XML Interchange (EXI) specifications.
This mechanism may be extended to the text values and to the values of the attributes. In the same way, at the first occurrence of a text value or an attribute value, this is normally coded in the file and an index is associated with it. The following occurrences of that value are coded using the index. This type of mechanism is implemented in several formats, in particular the formats in accordance with the Fast Infoset and EXI specifications.
Still another mechanism consists of using index tables for describing the structure of certain categories of items of the document. Thus, for example, it is possible to use an index table for each element item having a given name. At the first occurrence of a child item in the content of that item, a new entry describing that child item type is added to the index table. At following occurrences of a similar item, that new child item is described using the associated index. This type of mechanism is implemented in the formats in accordance with the EXI specification.
Scalable Vector Graphics (SVG) data format is an XML language for describing vector graphics. SVG uses the XML format and defines a set of elements and attributes making it possible in particular to describe geometric shapes, transformations, colors and animations.
A much used tool in SVG is the path which represents the outline of a shape. A graphics path is a set of commands and associated coordinates, making it possible to describe a complex graphic shape using segments, Bezier curves and elliptical arcs.
Binary XML formats may be used to code SVG documents. However, most of these formats have limitations with regard to the coding of SVG documents. This is because, in numerous SVG documents, the proportion of structure is small relative to the proportion of content. However, Binary XML formats are mainly directed to compressing the structure of XML documents. In relation to content, Binary XML formats can index the values, in order not to code several times the same value that is repeated in the content. They may also code, in a specific way, certain contents of which the type is known and simple, for example an integer or a real number. But SVG contents satisfy none of these criteria: SVG contents which are large in size are rarely repeated and generally do not correspond to simple types. These contents of large size are for example paths, which mix simple graphics commands with coordinates, or lists of integer or real values.
For this reason, it is necessary to create new Binary XML formats that are specific to SVG documents or to adapt existing Binary XML formats to efficiently code SVG documents.
The patent U.S. Pat. No. 6,624,769 describes a Binary XML format adapted to code SVG documents. This patent describes in particular a specific way to code SVG paths consisting of first coding the commands used in the path and only attributing a code to the commands present in the path. Furthermore, these codes are Huffman type codes, of which the attribution is predefined for all the existing commands.
The command arguments are coded in binary manner, using the minimum number of bits enabling any argument present in the path to be coded. More precisely, the patent is limited to the coding of integer arguments, corresponding to the SVG profiles for mobile phones, and separates the arguments into two categories: the arguments corresponding to absolute commands and those corresponding to relative commands. In the case of an absolute command, the argument directly represents a position in the SVG coordinate system whereas in the case of a relative command, the argument represents the movement from the previous position. For each type of argument, calculation is made of the minimum number of bits enabling any argument of that type present in the path to be coded. Next, each argument is coded over a number of bits depending on its type.
The format described in this patent enables compact SVG documents to be obtained, but only applies to a restricted category of documents and is still of limited efficiency in the case of large paths.
U.S. Patent application No. 20080063114 describes the Lightweight Application Scene Representation (LASeR) binary XML format. This format is targeted at coding SVG documents and provides a specific way of coding an SVG path. According to this document, an SVG path is coded using one of two methods that provides the best results.
The first method consists in coding the first two arguments of the SVG path using a first fixed length coding scheme and coding all the remaining arguments as relative arguments using two other fixed length coding schemes. The first coding length of the first fixed length coding scheme is the minimum coding length that allows the first two arguments to be coded. The other two coding lengths are computed in a similar way: the first of these other coding lengths is the minimum coding length allowing all the abscissa coordinates to be coded while the second of these other coding lengths is the minimum coding length allowing all ordinate coordinates to be coded.
The second method consists in coding the first two arguments using a first fixed length coding scheme and coding all the further arguments as relative arguments using an exponential-Golomb coding scheme.