1. Field of the Invention
Embodiments of the invention are generally related to processing content stored in a Content Management System (CMS). More specifically, embodiments of the invention are related to dynamic external entity resolution in XML based content management systems.
2. Description of the Related Art
Content management systems (CMS) allow multiple users to share information. Generally, a CMS system allows users to create, modify, archive, search, and remove data objects from an organized repository. The data objects managed by a CMS may include documents, spreadsheets, database records, digital images and digital video sequences. A CMS typically includes tools for document publishing, format management, revision control, indexing, search and retrieval, among others.
Often, extensible markup language (XML) may be used by CMS systems to describe and store data objects (which themselves may be XML documents) managed by the CMS system. XML is a widely used standard for creating markup languages to describe the structure of data, and many organizations publish domain-specific XML grammars for specific datasets using a document type definition or XML schema. A document type definition (DTD) is used to provide a formal definition of the elements, structures, and rules for marking up a given type of XML document. In other words, a DTD provides a statement of rules specifying which elements (markup tags) and attributes (values associated with specific tags) are allowed in an XML document. XML schemas provide another mechanism that may be used to define XML document structure and limitations. XML Schemas are themselves XML documents. An XML schema references a standardized XML namespace and may include a collection of supporting DTDs as well.
Frequently, data objects managed by a CMS may contain links to other files or documents. To give an example from the pharmaceutical industry, the International Conference on Harmonization of Technical Requirements (ICH) has published an XML grammar for XML documents governing electronic drug submissions to the FDA (known as the eCTD—electronic common technical document). The eCTD includes a standard set of XSLTs (extensible stylesheet transformation language) used to transform XML documents into HTML web pages viewed in a browser. However, when eCTD XSLT transforms are used as-is to transform XML data stored in the CMS, these transforms make certain assumptions about the location of linked documents. Unfortunately, this often leads to transformed HTML output that contains incomplete or unresolveable links. Sometimes, this occurs because the data stored in the eCTD XSLT generates HTML links with relative paths. (i.e., a link to another document based on a current location, as opposed to an absolute location) These links function properly when all of the references are available and stored relative to the HTML file (for example on the same client system), but in the case of viewing an eCTD from a Web-based CMS, these relative references fail to resolve properly. More generally, “broken” or unresolveable references to external entities may occur whenever a standardized set of XML DTD's or XSLT transforms assume a particular operating environment, and the documents need to be used by individuals outside of the particular operating environment.
One approach to the problem is to package all of the documents referenced by a document being checked out with the document being checked out. In many cases however, particularly in regulated industries, one document may include hundreds, if not thousands of links to other documents. It is simply unrealistic to package thousands of associated documents with the HTML for online display.
Another approach to addressing this problem is to modify the XSLT to generate output documents with paths to content in the CMS. However, XSLT transforms are often created by third parties such as government agencies and other external bodies. Using the eCTD as an example, companies may wish to create documents according to the eCTD standard. In such a case, it is important that the XSLT transforms are not modified. For this reason, editing a standardized XSLT is generally not considered an acceptable solution to the problem. Further, this approach is difficult to maintain, as anytime the original XSLT transform is modified by the external organization, the “custom” XSLT transform may need to be modified as well. Thus, creating a parallel set of XSLT transforms is not a sustainable approach.
Still another approach to addressing the problem is to insert code into the data object to allow the client requesting the object to resolve any external entity references (e.g., HTML hyperlinks to other documents). For example, embedded Java Script could be used to rewrite links or other external entity references on the fly. However, this approach relies on a new set of assumptions about what the client program needs to be able to do, and these assumptions may not always be accurate. For instance it assumes the client can run Java Script, but if the client is not a web browser it is unlikely to be able to run Java Script. Given the assumptions that must be made this solution is also unworkable in many cases.
Accordingly, there remains a need in the art for a method to provide dynamic external entity resolution in an XML based content management system.