1. Field of the Invention
The present invention relates to a computer system, and deals more particularly with a method, system and computer-readable code for modifying a document to reflect transformations that are desired to account for dynamic factors such as a document user""s current context (creating an Extensible Markup Language, or XML, dialect), and then dynamically generating a Document Type Definition (DTD) for this XML dialect.
2. Description of the Related Art
xe2x80x9cDTDxe2x80x9d is an acronym for xe2x80x9cDocument Type Definitionxe2x80x9d. In general, a DTD is a definition of the structure of a document encoded in SGML (xe2x80x9cStandard Generalized Markup Languagexe2x80x9d) or an SGML derivative. SGML is an international standard for specifying document structure, which provides for a platform-independent specification of document content and formatting. An SGML derivative is a notation using a subset of the SGML notation. Examples of SGML derivatives are HTML (xe2x80x9cHyperText Markup Languagexe2x80x9d) and XML (xe2x80x9cExtensible Markupe Languagexe2x80x9d). HTML is a subset of SGML that is directed toward document interchange on the World Wide Web (hereinafter, xe2x80x9cWebxe2x80x9d), and is considered the primary publishing language of the Web. XML is a simplified version of SGML, tailored to structured Web document content. (Refer to ISO 8879, xe2x80x9cStandard Generalized Markup Language (SGML)xe2x80x9d, (1986) for more information on SGML; to xe2x80x9cHTML 4.0 Specification, W3C Recommendation, revised on Apr. 24, 1998xe2x80x9d which is available on the Web at http://www.w3.org/TR/1998/REC-htm140-19980424, for more information on HTML; and to xe2x80x9cExtensible Markup Language (XML), W3C Recommendation Feb. 10, 1998xe2x80x9d which is available on the Web at http://www.w3.org/TR/1998/REC-xml-19980210, hereinafter xe2x80x9cXML Specificationxe2x80x9d, for more information on XML.)
A DTD is written using SGML syntax. The DTD is encoded in a file which is intended to be processed, along with the file containing a particular document, by an SGML parser. The DTD tells the parser how to interpret the document which was created according to that DTD. DTDs may be used to describe any document type. For example, suppose a DTD has been created for documents of type xe2x80x9cmemoxe2x80x9d. Memos typically contain xe2x80x9cToxe2x80x9d and xe2x80x9cFromxe2x80x9d information. The DTD would contain definitional elements for these items, telling the parser that these elements are valid syntax for this document type, as well as defining the syntax of subelements within these elements, etc.
HTML is a popular example of a notation for which an SGML DTD is defined. HTML is used for specifying the content and formatting of Web pages, where a software application commonly referred to as a xe2x80x9cWeb browserxe2x80x9d processes the HTML DTD along with a Web page (i.e. a document encoded in HTML) in the same manner an SGML parser is used for other DTDs and document types. DTDs may also be used with documents encoded in XML. When a user wishes to print or display a document encoded according to an XML DTD, the software (i.e. the parser, compiler or other application) uses the XML DTD file to determine how to process the contents of the XML document.
HTML and XML are tag languages, where specially-designated constructs referred to as xe2x80x9ctagsxe2x80x9d are used to delimit (or xe2x80x9cmark upxe2x80x9d) information. In the general case, a tag is a keyword that identifies what the data is which is associated with the tag, and is typically composed of a character string enclosed in special characters. xe2x80x9cSpecial charactersxe2x80x9d means characters other than letters and numbers, which are defined and reserved for use with tags. Special characters are used so that a parser processing the data stream will recognize that this a tag. A tag is normally inserted preceding its associated data: a corresponding tag may also be inserted following the data, to clearly identify where that data ends. As an example of using tags, the syntax xe2x80x9c less than p greater than xe2x80x9d in HTML indicates the beginning of a paragraph. In XML, xe2x80x9c less than email greater than xe2x80x9d could be used as a tag to indicate that the character string appearing in the data stream after this tag is to be treated as an e-mail address; the syntax xe2x80x9c less than /email greater than xe2x80x9d would then be inserted after the character string, to delimit where the e-mail character string ends.
XML is an xe2x80x9cextensiblexe2x80x9d markup language in that it provides users the capability to define their own tags. This makes XML a very powerful language that enables users to easily define a data model, which may change from one document to another. When an application generates the tags (and corresponding data) for a document according to a particular XML data model and transmits that document to another application that also understands this data model, the XML notation functions as a conduit, enabling a smooth transfer of information from one application to the other. By parsing the tags of the data model from the received document, the receiving application can re-create the information for display, printing, or other processing, as the generating application intended it. Conversely, HTML uses a particular set of predefined tags, and is therefore not a user-extensible language.
XML is a well-formed notation, meaning that all opening tags have corresponding closing tags (with the exception of a special xe2x80x9cemptyxe2x80x9d tag, which is both opened and closed by a single tag, such as xe2x80x9c less than email/ greater than xe2x80x9d), and each tag that nests within another tag is closed before the outer tag is closed. HTML, on the other hand, is not a well-formed notation. Some HTML tags do not require closing tags, and nested tags are not required to follow the strict requirements as described for XML (that is, in HTML a tag may be opened within a first outer tag, and closed within a different outer tag).
A parser for SGML or an SGML derivative may create a Document Object Model (hereinafter, xe2x80x9cDOMxe2x80x9d) tree representation of an input document during the parsing process. The Document Object Model is a language-independent application programming interface (xe2x80x9cAPIxe2x80x9d) for use with documents specified in SGML or a derivative of SGML. In particular, DOM is intended for use with HTML and XML. DOM is published as a Recommendation of the World Wide Web Consortium, titled xe2x80x9cDocument Object Model (DOM) Level 1 Specification, Version 1.0xe2x80x9d (1998) and available on the World Wide Web at http://www.w3.org/TR/REC-DOM-Level-1.
The DOM API enables application programs to access a tree-oriented abstraction of a document. It is this tree-oriented form that is created from the XML document by an XML parser. An application program can manipulate document structure and contents (that is, by changing, deleting, and/or adding elements in the DOM tree). Further, the DOM enables navigating the structure of the document by navigating the corresponding tree. While the term xe2x80x9cdocumentxe2x80x9d is used herein when discussing XML (and the corresponding DOM trees), it is to be understood that the information represented using XML may represent any type of information, and is not limited to the traditional interpretation of the word xe2x80x9cdocumentxe2x80x9d. For example, XML may be used to represent the layout of records in a data repository, the layout of a user interface for an application program, or the data to be used with a program or to be used as the values of records in a repository. For ease of reference, the term xe2x80x9cdocumentxe2x80x9d will be used herein to refer to these diverse types of information. xe2x80x9cDOM treexe2x80x9d refers to the logical structure with which a document is modeled using the DOM. A DOM tree is a hierarchical representation of the document structure and contents. Each valid DOM tree has a root node and one or more leaf nodes, with zero or more intermediate nodes, using the terminology for tree structures that is commonly known in the computer programming art. A node""s predecessor node in the tree is called a xe2x80x9cparentxe2x80x9d and nodes below a given node in the tree are called xe2x80x9cchildxe2x80x9d nodes.
When an XML parser processes an input document, it reads the document and constructs a DOM tree based on the syntax of the tags embedded in the document and the interrelationships between those tags. The tag syntax is stored in the nodes of the DOM tree, and the shape of the tree is determined from the tag relationships. The DOM specification is defined to provide xe2x80x9cstructural isomorphismxe2x80x9dxe2x80x94that is, any XML parser that parses a given XML input document will create an identical DOM tree.
Due to the well-formed nature of XML documents, their corresponding DOM trees will also be well-formed. When HTML documents are not well-formed, however, DOM trees cannot be generated. Instead, a tedious manual translation of the source document must first be applied, where invalid syntax is located and corrected. One type of further processing that is desirable is to transform an input document to account for a particular document user""s context. By transforming a document for a specific user context, the document can be optimized for its intended user. User context information may include user-related preferences as well as various limitations, such as: who this user is; what type of network connection he is currently using (e.g. whether his connection may be limited in bandwidth); and what type of device and browser the user is currently using. As is evident from these types of factors, a user""s context may vary from one user to another, and may also vary for a single user over time. Accordingly, the desired transformations to optimize a document for a user""s context will also vary dynamically. As an example of a context-specific transform, suppose the user requests downloading of a Web page which embeds very large image or video files. Further suppose that the user is currently connected to a server over a limited-bandwidth connection. It would be desirable in this situation to transform the page before downloading it over the current connection, reducing the size of the embedded files (e.g. to reduce the time required for transmission). If, however, the user is connected by a high-speed link, this type of transform might not be desirable. A further complicating factor in performing these types of dynamic transformations on a document represented by a DOM tree is that the particular transform processing that will be available to perform on a given document may not, in some cases, be easily determined in advance. For example, transform processing for particular context situations may be developed by different software vendors, so that one computing system may have different transforms available than are available in a different computing system. This unpredictability of available transforms, coupled with an HTML document that is not well-formed, prevents existing techniques from efficiently performing automated transformations on an input HTML document, such as transforming a document for a user context that varies dynamically (as described above).
Accordingly, a need exists for a technique with which well-formed documents can be automatically transformed using dynamically-selected transformations (such as those that will indicate a user""s current context). The present invention provides a novel way to translate a well-formed dialect of XML into a dialect which indicates dynamically-selected document transformations that are desired. Further, the present invention provides a novel technique for dynamically generating a DTD to describe the new XML dialect, so that the XML document created in this dialect can subsequently be processed by an XML parser for the desired manner of presentation to the user.
An object of the present invention is to provide a technique whereby an XML document can be modified to indicate document transformations that are desired, creating a dynamically-generated XML dialect.
Yet another object of the present invention is to provide a technique for reflecting a user""s current context in the XML dialect.
Still another object of the present invention is to provide a technique whereby a DTD is dynamically generated to describe the XML dialect, so that the XML document created in this dialect can subsequently be processed by an XML parser for the desired manner of presentation to the user in his particular context.
It is another object of the present invention to provide this XML dialect and DTD generation in a manner that adapts dynamically to a particular user""s context.
Other objects and advantages of the present invention will be set forth in part in the description and in the drawings which follow and, in part, will be obvious from the description or may be learned by practice of the invention.
To achieve the foregoing objects, and in accordance with the purpose of the invention as broadly described herein, the present invention provides a software-implemented method, system, and computer-readable code for use in a computing environment for enabling efficient automated transformation of an input document to account for dynamic factors. This comprises: determining values of a set of dynamic factors; reflecting the values in a well-formed notation of the input document, such that a dynamically-generated dialect of the well-formed notation is created; and dynamically generating a Document Type Definition (DTD) to describe the dynamically-generated dialect. The well-formed notation and the dynamically-generated dialect may be Extensible Markup Language (XML) dialects. The dynamic factors may represent a user context, and this user context may comprise one or more of: one or more preferences of a user; a network connection of said user; a device type of said user; and a browser type of said user The dynamic generation of the DTD preferably further comprises: creating element declarations for each detected element in the dynamically-generated dialect; creating attribute declarations for each detected attribute in the dialect; and compacting said DTD. Compacting the DTD preferably further comprises: generating entity declarations to reduce a size of the dialect or of the DTD; generating parameter declarations to reduce the size of the DTD; and generating attribute defaults for the attribute declarations.
The present invention will now be described with reference to the following drawings, in which like reference numbers denote the same element throughout.