1. Field of the Invention
The present invention is generally related to a computer system, more particularly, to communications of structured documents (or structured texts) between computers.
2. Description of the Related Art
Structured documents, such as XML and SGML documents, designate documents into which data structure information thereof is incorporated therein. In order to describe the data structure thereof, structured documents include symbols called “tag”. Structured documents have advantages of improved flexibility and expandability in the data structure, and these advantages promote the use of structured documents for data exchange between different computers or different applications.
Structure of a structured document is defined in a document type definition. For example, a document type definition (DTD) or an XML schema is typically used as a document type definition of an XML document. A document type definition may be incorporated into the structured document, or independently prepared for the associated structured document. An independently prepared file that describes a document type definition of a structured document is referred to as a document structure definition file.
A structured document is required to be in compliance with a document type definition. A structured document not in compliance with the document type definition may cause a computer to incorrectly recognize the contents of the structured document.
Therefore, there is a need for validating a structured document. Nishioka et al. disclose a structured document processor for validation of structured document in Japanese Open Laid Patent Application No. Jp-A 2001-75958 discloses. The disclosed structured document processor is provided with validation libraries for validating whether a structured document is in compliance with a document structure declaration.
Atsumi discloses a structure testing apparatus for validating structured documents in Japanese Open Laid Patent Application No. Jp-A-Heisei 8-190560. The disclosed testing apparatus is composed of a test data generating module which generates a document structure table from a structured document to list element IDs, element names, and contents of the elements, and a structure test module which validates the structured document using the document structure table.
One of the problems in validation of structured documents is that a considerable amount of processing is required for validation. Structured documents tend to be large in size and to have many repeated structures. The increased size and repetition of structured documents undesirably increases the amount of processing for validation, and thus increases the validation cost.
This problem is especially serious in transferring a structured document between computers. When a structured document is transferred between computers, the structured document is preferably validated by not only the sending computer but also the receiving computer, because a communication error may invalidate the structured document received by the receiving computer. Nevertheless, performing validation check of a structured document in both of the sending and receiving computers undesirably increases the amount of processing for confirming validity of the document.
Other techniques are disclosed for encoding or processing structured documents as follows. Firstly, Imaoka discloses a method for encoding XML data in Japanese Open Laid Patent Application No. Jp-A 2002-244894. The disclosed encoding method involves converting a DTD into a type described in an ASN.1 abstract syntax, dividing XML data into the element contents and structure thereof, converting the structure into values described in the ASN.1 abstract syntax, converting the values into an ASN.1 transfer syntax, compressing the element contents, and incorporating the compressed element contents and the ASN.1 transfer syntax.
Liefke and Suciu disclose a method for efficiently compressing XML documents in “XMill: an Efficient Compressor for XML Data” in proceedings of ACM SIGMOD Symposium on the Management of Data, 2000. This method discloses that XML documents are first divided into text and structure regions, the texts are classified by text types, duplicated texts are eliminated, and then the whole texts are compressed separately by their types. This method succeeded on efficient compression of XML documents in size, but did not mention about compression for reducing validation cost.
David Mertz discloses similar compression technologies of XML documents in http://www-6.ibm.com/jp/developerworks/xml/020125/j_x-matters13.html, but he did neither mention about the compression for reducing validation cost.
Satoh discloses a structured document processing system for efficiently compressing structured documents and reducing the amount of processing necessary for tag analysis in Japanese Open Laid Patent Application No. Jp-A 2002-163248. The structured document processing system includes a set of structured document compressing and uncompressing units. The compressing unit includes a tag list generating module generating a common tag list used for a plurality of structured documents, a compression module generating compressed documents of the plurality of structured document through replacing tags with delimiter codes, and an output module incorporating the tag list and the compressed documents to develop a compression result document. The decompressing unit includes a reproducing module reproducing a data structure from the tag list, and a write module reproducing the element contents from the compressed documents so as to associate the positions of the tags in the compressed documents with those of the tags in the data structure.
Maruyama discloses a data compression apparatus for structured documents in Japanese Open Laid Patent Application No. 2001-217720. The disclosed compression apparatus includes an encoder which divides tree-structured documents into the structure and contents to encode the structure, and a compression processor compressing the contents of the document.
Nishioka discloses an XML data converter in Japanese Open Laid Patent Application No. 2001-331479, the XML data converter including planarizing means for planarizing a DTD, DTD graph generating means for generating a DTD graph from the planarized DTD, schema generating means for generating a schema of an object relational model from the DTD graph, XML document generating means for generating a well-structured XML document for the planarized DTD, and object relational model generating means for data of the object relational model from the well-structured XML document.
Finally, Imaoka et al. disclose a structured document processor in Japanese Open Laid Patent Application No. Jp-A-Heisei 10-214265. The disclosed processor includes document structure analyzing means for analyzing an input document to generate an input document tree structure, document processing instruction interpreter means for interpreting and executing instructions to generating an output document tree structure from the input document tree structure, and structured document reproducing means for reproducing an output document from the output document tree structure.
However, no prior art is concerned on communications of structured documents with reduced amount of processing necessary for validation of structured documents.