1. Field of the Invention
This invention relates to processing Web documents, and more specifically to a system, method, and program for dynamically generating a document type definition during runtime.
2. Description of the Related Art
The Internet, initially referred to as a collection of “interconnected networks”, is a set of computer networks, possibly dissimilar, joined together by means of gateways that handle data transfer and the conversion of messages from the sending network to the protocols used by the receiving network. When capitalized, the term “Internet” refers to the collection of networks and gateways that use the TCP/IP suite or protocols.
Currently, the most commonly employed method of transferring data over the Internet is to employ the World Wide Web environment, referred to herein as “the Web”. Other Internet resources exist for transferring information, such as File Transfer Protocol (FTP) and Gopher, but have not achieved the popularity of the Web. In the Web environment, servers and clients effect data transfer using the Hypertext Transfer Protocol (HTTP), a known protocol for handling the transfer of various data files (e.g., text, still graphic images, audio, motion video, etc.).
These data files, referred to herein as Web pages or documents, are typically written in a markup language. These markup languages may include Hypertext Markup Language (HTML), extensible markup language (XML), Standard Generalized Markup Language (SGML), etc. SGML is the international standard for defining descriptions of the structure of different types of electronic documents. SGML is very powerful and complex. HTML is a smaller application of SGML. HTML defines a very simple type of report-style documents, with section headings, paragraphs, lists, tables, illustrations, hypertext, and multimedia.
XML is also a smaller version of SGML. XML has retained enough of the useful functions of SGML, but has removed some of SGML's optional features that makes SGML too complex to program for in a Web environment. Unlike HTML, XML does not have a fixed format; hence, it is referred to as being extensible. XML is a meta language for describing other customized markup languages that are used in defining the structure of a limitless number of different types of documents. All XML documents are made up of the following: elements, tags, attributes, entities, PCDATA, CDATA. It should be noted that XML is not just for Web pages. It can be used to store any kind of structured information, and to enclose or encapsulate information in order to pass it between different computing systems.
XML is not dependent on a single, inflexible document type, as HTML is. Nor does XML have the complexity of full SGML. XML allows the flexible development of user-defined document types. It provides a robust, nonproprietary, persistent, and verifiable file format for the storage and transmission of text and data both on and off the Web.
A Document Type Definition (DTD) is a set of rules that a document follows. More specifically, these rules generally state the name and contents of each element and in which context it can and must exist. A DTD defines the document structure with a list of elements that are defined for the XML document. A DTD allows each XML file to carry a description of its own format with it. This enables independent groups of people to agree to use a common DTD for interchanging data that is of interest to their own specific group, i.e., domain. An application can then use that domain's standard DTD to verify that the application's data and/or the data being received is valid. This verification is performed before the application processes and displays the document.
A well formed XML document is a document that conforms to the XML syntax rules. A valid XML document is a well formed XML document, which also conforms to the rules of a DTD. The primary difference between valid and well formed XML is whether or not there is a DTD. Well formed XML is designed for use without a DTD, whereas valid XML requires a DTD. The W3C XML specification states that a program should not continue to process an XML document if it finds a validation error. This helps ensure that browsers handling XML documents will be compatible. In contrast, many HTML browsers are not compatible with each other because HTML allows formatting errors such as the omission of end tags. HTML browsers have different ways of interpreting what was intended when there is an error, and thus may process the same document differently, thereby leading to incompatibility between browsers.
As such, in order for a browser to process an XML document, the XML document must be well formed and valid. A precise way to check the well-formedness and validity of one's valid XML document is to use a parser which checks for errors in XML documents. A validating XML parser can be used to check valid XML documents. When an XML document is processed, it is compared with the DTD to be sure it is structured correctly and all tags are used in the proper manner. This comparison process, performed by the parser, is called validation. Parsers are a helpful tool in understanding the reason why an XML document is not being read properly. Parsers can also be used while an XML document is being created in order to ensure that the XML document is being created correctly.
An advantage of utilizing DTDs for XML documents is that a given document can be readily validated in accordance with its DTD; and consequently be successfully processed and displayed. As such, for an XML document that may not have a DTD associated with it, it is known in the art to construct a DTD from a given XML document. There are programs that exist that will analyze the XML document and create a DTD based upon that document.
A disadvantage of utilizing DTDs for XML documents, however, is that an XML document being created must conform to its associated DTD to be valid. It may be desirable to extend an XML document that varies from its associated DTD without creating a completely separate DTD.
It is known to merge an existing fully valid complete DTD with another existing fully valid complete DTD to create yet another DTD for use at a later time. The created DTD is not, however, being processed in real-time. The merging does not take place during runtime. Furthermore, because the merging of two DTDs can be done without any knowledge of the domain associated with the DTD, entities may be created that are not usable in the context of the specific domain of the XML documents.
It is easy to extend an XML document that has no DTD. However, in order for a parser to validate an XML document, a DTD (Document Type Definition) must be provided. When a DTD is provided, the XML document is no longer extensible. For example, in order to enable an XML document to be validated, a static XML DTD file, which cannot be extended or changed, may be provided. As such, users provide portions of DTDs in files. The master DTD includes references to the other files on the file system. However, this is merely a static inclusion of text files.
However, a problem exists if one wants to allow customers to add new elements to the document, making the document extensible, and still perform validation. For example, if a DTD file is provided, and customers are allowed to manually edit it, the customers could change existing elements which may cause problems. If the new elements are not defined in the DTD, then validation of documents with these new elements will fail.
One of the problems being faced is a way in which to create an XML document that can be validated which requires the use of a document type definition (DTD), but also allow a given document type to be extended by plug-ins, i.e., by having users plug in to an architecture. Generally, DTDs are fixed constraints on XML documents. These fixed constraints can reside either externally to the document itself, such as a reference to a file name that contains the definition; or they can be contained in-line within the document itself. Although it is desirable to be able to extend a document type definition, it is not desirable to allow a user to arbitrarily create extensions to a document type definition. It is desirable to ensure that the types of documents being parsed are related to the subject matter at hand, i.e., the domain or context in which the XML document was originally created. As such, a user should not be able to define inline within their document DTD extensions since the user could just insert arbitrary text within the internal DTD. On the other hand, an external non-extensible DTD file is not desirable, either.