1. Field of the Invention
The present invention relates to encoding technology for structured document data.
2. Description of the Related Art
Conventionally, in XML (extensible markup language) language specifications developed by W3C, when encoding data in XML format, the encoding (text encoding) is commonly performed using a method such as UTF-8 or UTF-16.
In contrast, for devices with few hardware resources, such as mobile phones, digital cameras, and printers, a reduction in the size of XML data and speeding up of parse processing are demanded. In order to satisfy these demands, an encoding technology referred to as “binary XML” has also come into use in recent years. With binary XML, structures such as XML elements or attributes are encoded into binary data, and the values of elements or attributes are encoded with the original data type such as integers and decimals. By encoding into binary data, the data size decreases and the parse processing is speeded up more than in a case of text encoding such as UTF-8 or UTF-16.
Further, according to Fast Infoset specifications developed by ISO (ISO/IEC 24824-1, 2007), attribute values or element contents within XML data can be encoded in a binary format that is suitable to the original data type, such as integer and decimals. It is therefore possible to realize further decreases in data sizes and reductions in data processing times.
However, there are the following problems with the aforementioned data encoding technology. For example, in some cases the values of complex data structures, not just simple data types such as integers or decimals, are described as attribute values or the contents (values) of elements. For example, in document data in SVG (scalable vector graphics) format, a complex value in which a drawing command or coordinates information is incorporated is described as an attribute value. In such a case, it is difficult to recognize and encode the data structure in a general-purpose manner. If only encoding a data structure, encoding into text-format XML can also be performed using, for example, SOAP encoding. However, according to this method, since the data size becomes larger than the original data size and character string analysis processing is required, the original objectives of reducing the data size and making processing more efficient are not achieved. Consequently, it has been common to encode data as a series of character strings as in the case of text XML.
More specifically, when performing text encoding, even when data that is described as an attribute value or as element contents is data other than a character, such as an integer or a decimal, the data has had to be encoded as a character. When a numerical value is encoded using characters, the data size becomes larger than when encoding with a binary representation and time is also required for processing. Therefore, in a case where these kind of unique structures occupy a large portion of document data, as in the case of the aforementioned SVG, it has been difficult to realize size reductions or speeding-up of parsing and the like.