In recent years, along with the prevalence of markup languages such as XML (Extensible Markup Language) and the like, the configuration of document data has diversified. Conventional document data generally consists of one file. However, in recent years, document data is configured using various configurations:
(1) a configuration that configures one document using a plurality of files (XML, HTML (Hyper Text Markup Language), and the like), and
(2) a configuration that configures a plurality of documents using one file (XML, archive file, and the like).
Conventionally, a document management system that manages one file as one document is known. When document data with configuration (1) is stored in such a document management system, a plurality of files are stored and managed as independent documents. On the other hand, when document data with configuration (2) is stored, one file including a plurality of documents is stored and managed intact as one document. Note that document management mainly includes access right control, attribute assignment, classification/arrangement, and the like, and the same applies to the following description unless otherwise specified.
A different document management system is also widely known. That is, in this system, when some processing instruction (e.g., move, copy, delete, or the like) is issued to a main body file (e.g., an HTML file or the like) of a document such as an HTML document to which an attached file (e.g., an image file or the like) is associated with the main body file, the same processing is automatically done for the attached file. Such document management system allows the user to easily manage the main body file and attached file as one document together.
Upon storing a structured document including a plurality of entities (minimum units of a document) like document data with configuration (2), a document management system which divides a file for respective entities, and manages the divided data as a document is also known (Japanese Patent Laid-Open No. 2001-167086).
In the aforementioned conventional document management systems, management is uniformly made using an identical management unit, i.e., one of a file or document as a unit for either of document data with configuration (1) in which one document is configured by a plurality of files, and document data with configuration (2) in which a plurality of documents are configured by one file.
The system that manages one file as one document is not suited to manage document data in which one document is configured by a plurality of files.
The document management system which allows to manage a main body file and attached file as one document together can handle a plurality of files as one document for a document with a specific format such as an HTML document which is defined in advance based on extensions and the like. However, such document management system cannot handle a plurality of files as one document for a document which is not compliant with the specific format.
In the document management system which divides a file into respective entities and manages divided data upon storing a structured document including a plurality of entities like the system disclosed in Japanese Patent Laid-Open No. 2001-167086, an attribute as a single file of the original structured document file cannot be utilized. For example, a system that derives a hash value from binary information of a file itself, and attaches a signature to that hash value so as to attain falsification detection is available. However, in this case, if a source file is divided and stored, binary characteristics of an actual file itself are destroyed, and when such divided files are stored in the document management system, falsification is undesirably detected, resulting in poor collaboration between the two systems.
The conventional document management system cannot easily manage an internal structure that combines structures (1) and (2). Such a drawback will be described below with reference to FIG. 5. FIG. 5 illustrates an example of a document having a combined structure.
Reference numeral 501 denotes a structured file group which includes Base.xml as a main file (main body file); and 502 and 503, data contents of Base.xml and Spec.xml.
Referring to FIG. 5, Spec.xml and Detail.xml are sub-files (attached files) of Base.xml. This fact is described by reference tags 506 and 507 to Spec.xml and Detail.xml in the data contents 502 of Base.xml. The reference relationship of these files is visualized since browse software automatically loads and merges Spec.xml and Detail.xml and displays them as one document upon browsing Base.xml using the browse software. This is the same scheme as that of Internet browse software (Web browser or the like) which automatically loads image files of an HTML document to configure and display images.
Concept.xml, Report.xml, and other jpeg files are also sub-files of the corresponding host files.
Upon managing such documents, a case will be examined below wherein the user wants to manage Base.xml as independent documents in correspondence with entities 508, 509, and 510 in <News> and </News>, and Spec.xml, i.e., four files including Concept.xml, Package.jpg, and Map.jpg as sub-files as one document.
However, when data with such configuration are fetched by a conventional document management system 504 described above, since one file is fetched as one document, files are registered as independent documents, as indicated by 505. In the conventional document management system, once files are stored in the system in units like 505, the document management units cannot be changed by merely changing the storage locations or status in the system.
Also, an arrangement which separately comprises a document creation area for temporarily storing a created document, and a document publish area for publishing and sharing a created document to be shared in a team of a plurality of users, has been proposed. In such an arrangement, security of each individual information must be enhanced in the process of creating a document, and there is a need for setting access rights for respective entities (e.g., chapters or clauses). However, after the document is published, if the respective entities are separately published, it is difficult manage them as an integral product. For this reason, after the document is “published”, there is also a need for publishing these entities as one document together. However, the conventional document management system cannot satisfy the above needs since it cannot change a management unit in correspondence with changes in storage location or status of a document.