1. Field of the Invention
Embodiments of the invention are generally related to managing a collection of data objects in a content management system. More specifically, embodiments of the invention are related to a method and system for managing XML documents stored in a content management system using configuration set relationships.
2. Description of the Related Art
Content management systems (CMS) allow multiple users to share information. Generally, a CMS system allows users to create, modify, archive, search, and remove data objects from an organized repository. The data objects managed by CMS may include documents, spreadsheets, database records, digital images and digital video sequences, to name but a few. A CMS typically includes tools for document publishing, format management, revision and/or access control, along with tools for document indexing, searching, and retrieval.
A CMS may be configured with rules for processing documents whenever documents flow into or out of the repository. For example, rules may be defined for XML documents in the repository to provide additional functions such as bursting of XML fragments and synchronization of content with attributes. Often these rules are included with a logical collection of other XML configuration artifacts such as DTDs, schemas, style sheets, etc. This collection of XML configuration artifacts is referred to as a configuration set. In order to be processed correctly, XML documents must be associated with the proper configuration set; for example, a configuration set that matches the grammar or document type of the document being processed. Some CMS systems select a collection of XML configuration artifacts based on the content of the XML document and other repository attributes. Such systems however, typically rely on a static directory structure and a limited collection of attributes to manage the XML configuration artifacts for a given document. Problems may arise with this approach when the CMS contains multiple configuration sets that are very similar. A CMS typically tries to automatically determine the correct configuration set to associate with an XML document in the repository; however, if all of the factors that determine this association, such as DTD name, schema, or other repository attributes, are identical for more than one configuration set, then the CMS may not be able to determine which configuration set to use. This could be the case when two or more XML documents are very similar but have subtle differences; for example, they use different style sheets. Since a small difference such as a different style sheet would not likely affect the CMS' matching algorithm, the system may be unable to determine which configuration set to use or may choose an incorrect one.
Further, document types using specialized or industry specific grammars often include references to other documents. For example, document types may have a parent child relationship or a compound document backbone may be used to support any number of modules. A good example of this scenario occurs in the field of regulatory compliance in the pharmaceutical industry. The International Conference on Harmonisation of Technical Requirements (ICH) has published a standard set of XML files for governing electronic drug submissions to the FDA (known as eCTD—electronic common technical document). The eCTD is basically an XML backbone that references additional supporting documents. Some of its supporting documents are also XML documents governed by their own grammar. However, when the user creates a new eCTD document, the supporting XML documents should be associated with their own sets of XML related artifacts (DTDs, schemas, etc) and these associations should be transparent to the user. Current approaches, however, to managing documents fail to address these types of complex compound document structures.
Similarly, situations often arise where multiple versions of a DTD or schema need to be effective at the same time for a particular document type. However, the approach of statically linking a document type to a particular set of related files does not provide the ability to associate different versions of a DTD or schema (or any XML configuration artifact for that matter) to documents of a particular type. For example, assume that a regulatory organization has published version 3.0 and version 4.0 of a document specification governing new drug applications, the organization may agree to accept documents based on version 3.0 until some specified date, at which point relevant parties must submit documents based on version 4.0. A company may have several in-progress documents that have been developed according to version 3.0and that will be submitted until the new version is required, but the company would like to create all new documents using the 4.0 schema.
Accordingly, there remains a need for techniques for managing configuration sets (e.g., a collection of XML schemas, DTDs, style sheets, transforms, etc.) for documents stored in a content management system.