1. Field of the Invention
The present invention relates to a system and method for managing a structured document such as an extensible markup language (XML) document. More specifically, the invention relates to a structured document management system, structured document management method and program for managing structured document database.
2. Description of the Related Art
A document having a logical structure is generally called a structured document. This logical structure is represented by tags described in the document. Such a structure document is suitable to be processed by a computer.
An extensible markup language (XML) is widely used as language for describing the structured document using tags. The XML has the advantages that data can hierarchically be structured by significant tags and the structure can freely be extended. A document described with the XML is called an XML document. The XML document is known as a typical structured document that is logically represented by a tree structure using the tags.
The XML document is generally stored as a plain-text-format file. In most cases, one XML document is described in one file; therefore, a plurality of XML documents are stored as their respective text-format files.
A database that is capable of storing an XML document with the advantages of the XML and retrieving an arbitrary logical structure (document structure) or an arbitrary element from the XML document is called an XML database (XMLDB). Software that manages the XMLDB is called an XMLDB management system (XMLDBMS). In the conventional XMLDBMS, generally, an XML document is stored in the XMLDB in text format as it is or in binary format unique to the system. The XMLDB adopting such an XML document storing method inherits a storage structure that the original XML documents are separated into their respective files. The XML documents are therefore stored in, e.g., text format while being separated from each other.
When an XMLDB adopting the XML document storing method is retrieved from an application running on a client's terminal, an XML document is generally obtained in text format as a result of the retrieval. In an XMLDB that stores an XML document in unique binary format, the XML document as a result of the retrieval can be returned in the binary format. However, the binary-format data becomes difficult to process afterward on the application side. In general, the XMLDBMS converts the binary-format XML document, which is retrieved from the XMLDB, into a text-format one and returns it to the application. In other words, the XML document retrieved from the XMLDB is returned to the application in text format when it is stored in unique binary format as well as in text format.
The application hardly uses the returned text-format XML document as it is. Most of applications that process XML documents parse an acquired text-format XML document into parsed data using an XML processor (XML parser). The parsed data is set in format available for the applications. Concurrently with this parse process, the XML parser verifies whether the XML document is described in conformity with the syntax of XML. The parse process becomes a heavy load on the applications depending on the contents and volume of the XML document.
When the results of retrieval are included in a plurality of XML documents in the XMLDB, the XML documents which are obtained as the results of the retrieval are separated into their respective files. The reason is that the storage structure of the XML documents in the XMLDB is inherited from that of the original XML documents. The process of retrieving and updating data of the XML documents stored in the XMLDB is performed file by file, and the application always needs to pay attention to each of the files.
Jpn. Pat. Appln. KOKAI Publication No. 2003-157249 (paragraphs 0003 and 0013) discloses a method of converting a set of XML documents into a document object model (DOM) set and storing it in an XMLDB. Adopting this storing method, an application need not always pay attention to each of files in performing a process of retrieving and updating data of the XML documents stored in the XMLDB.
Conventionally, an XML document is stored in the XMLDB in text format or in unique binary format as described above. In order for a client (an application running on a client) to retrieve the XML document and use it, it needs to be parsed by the client (or XMLDBMS).
There is a possibility that the parse process will become a heavy load on the client (or XMLDBMS) in terms of its property. Since the parse process is required each time an XML document is retrieved, there is a possibility that the client will have to bear a large cost for the parse process. In a process of updating an XML document stored in the XMLDB, when the range to be updated is data of part of the XML document, the XML document needs to be parsed again to update the data, and its format needs to be returned to the original one.