Today, the eXtensible Markup Language (XML) is the foundation of many Web Services architectures and plays a significant and widespread role in computer networking products and data exchanges. However, XML data tends to be comprehensively defined, or verbose in vernacular terms, and thus, the data size of any particular XML representation is likely to be several times the size of the raw data represented by the XML. Therefore, it is desirable, and even critical for some applications, to efficiently compress XML data to reduce network bandwidth and storage usage, i.e., to improve the compression ratio.
In consideration of this problem, to efficiently compress XML data, a schema-aided XML compression scheme was developed that improves the compression ratio generally by separating the structure of an XML document from its content, improving the compression efficiency of the structure part by utilizing XML schema, grouping the content into different groups with related meaning or type, and applying native encoding to different types of content. For instance, U.S. patent application Ser. No. 10/177,830, filed Jun. 21, 2002, entitled “Method and System for Encoding a Mark-Up Language Document” describes a method where the structure of the mark-up language document is condensed by removing those parts of the structure that are fixed, and by expressing the variable parts of the structure in terms of which elements occur, whether elements occur, or how often certain elements occur. This may involve separating the structure of the mark-up language document from its content, and treating the structure and content differently. In various embodiments described in the '830 application, the content of the mark-up language document is, itself, compressed by grouping similar or related data items together.
The key, or general, idea of such compression techniques is the utilization of the underlying XML schema to improve the compression ratio. In general, however, many cooperating machines can present difficulties with such systems when the computers are not co-located with access to the same XML schema, or representation thereof. Plus, any improvement to XML compression or decompression speed or compression ratio can result in large savings for large amounts of data; thus, improvements to prior art compression systems are desirable.