Transmitted information for current computer systems is often formatted using the extensible mark-up language (XML) standard. The XML standard provides a powerful and efficient language through which to communicate a wide range of data and information in a standard format that can be recognized across a wide variety of different computing platforms. As such, XML provides a flexible and common framework for improving compatibility of data transfer between systems. One significant disadvantage to XML, however, is that the XML standard was not designed for communication efficiency, which is needed in certain environments such as narrow or limited bandwidth channels. The relative inefficiency of communicating XML-formatted data, therefore, causes problems with devices such as cell phones, dial-up modems, and other low or narrow bandwidth systems.
FIG. 1 (prior art) provides a block for a prior art system in which XML formatted data is communicated through a network. Block 104 represents an XML formatted document, data or information that is to be communicated by one system to another through a network 102 and reconstructed or received as XML formatted document, data or information, as represented by block 106. The network 102 can be made up of any of a wide variety of communications systems and devices, both wired and wireless, that ultimately provide communication connectivity between two systems. As shown in FIG. 1, the defined technique for communicating this XML formatted document 104 is to represent the text as ASCII or Unicode data words, to transmit this ASCII or Unicode data from a first system through the network as represented by line 108, and to receive this ASCII or Unicode data from the network by a second system as represented by line 110.
The ASCII and Unicode standards are two well-known textual coding schemes for representing text characters as sets of binary bits. The ASCII standard provides an 8-bit data byte that represents a character set of 256 commonly used characters, including the alpha-numeric and punctuation symbols. The Unicode standard basically provides an extension of ASCII with similar encoding but additional 8-bit bytes representing additional characters for coverage of languages other than English to include Japanese, Chinese, and numerous other languages and lexicons. As with typical textual encoding schemes, ASCII and Unicode possess inherent inefficiencies in that they are limited in range and extension by their encoding scheme. For example, typical textual formats, such as ASCII, use fixed bit fields which are not easily extended. They are also extremely inefficient methods for the encoding of numbers in that each numerical digit as well as included decimal points each consume at least 8-bits. Additionally transmission systems typically use inflexible fixed bit messages or field groups. Changes to such formats are also time consuming and costly in terms of labor, shipping and installation. In addition, in fixed bit message or field grouping constructs many dependencies often exist between fields (e.g., separate accuracy, multiplier, unit, and overlay field indicators). To reduce bandwidth requirements to transmit typical binary encoded data, data is often manipulated in scale, units, etc. to reduce the total fixed bits for the transmission. The disadvantage is that this reduction in fixed bits typically results in loss of accuracy from the measurement initially generated by the producing device.
Previous efforts have existed to provide a binary XML content format. One such effort is called Wireless Application Protocol Binary XML (WBXML). This WBXML specification defines a compact binary representation of XML. This binary XML content format is designed to reduce the transmission size of XML documents, allowing more effective use of XML data on narrowband communication channels. The binary format is also designed to allow for compact transmission with no loss of functionality or semantic information. The format is designed to preserve the element structure of XML, allowing a browser to skip unknown elements or attributes. The binary format encodes the parsed physical form of an XML document, i.e., the structure and content of the document entities. Meta-information, including the document type definition and conditional sections, is removed when the document is converted to the binary format. Unfortunately, the WBXML content format does not adequately define a binary XML solution that achieves needed user and operational requirements. For example, the WBXML structure utilizes a number of less efficient or less extensible encoding approaches such as inclusion of null bytes to indicate the end of string values; inclusion of a string table in the binary transmission instance; and use of fixed tokenization with code spaces, code pages, and end tokens to replicate the textual structure of XML into the WBXML binary representation.
One other approach attempted to generate binary formatting through the tokenization of tags and attribute names, as described in published U.S. Patent Application No. 2003/0046317A1, which is entitled “Method and System for Providing an XML Binary Format.” The encoding approach described in this published application concentrates on the minimization of processing time and thus also utilizes a number of less space efficient or less extensible encoding methods and structures. Such undesirable methods and structures include a requirement for a pre-defined fixed set of tokens, inclusion of a length value to indicate the number of characters in all strings, and inclusion of END tokens.