A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
This invention relates to document format conversion. More particularly, the invention relates to converting between a structured language element and an object embeddable in the native format of a document editor.
Electronic documents may be composed according to a variety of document formats which differ in the way they represent a document. For example, two documents in two different formats may appear similar when displayed to the user. However, the internal representations of the documents may be very different. Since a document editor is typically able to load and save only those documents of a particular format (i.e., the document editor""s xe2x80x9cnativexe2x80x9d format), a common problem related to document formats is converting documents from one document format to another.
A particular type of document format is referred to herein as a structured language. Structured languages use plain text to denote how a document should be formatted and to indicate items within the document. Particular structured languages such as the Hypertext Markup Language (HTML) and the Virtual Reality Modeling Language (VRML) are widely used on the Internet, a world-wide network of cooperating computer networks. However, structured languages can also be used for non Internet purposes.
HTML comprises plain text and HTML tags, the latter of which are comprised of HTML tag components (e.g., an HTML tag name, HTML attribute keyword or HTML attribute value). One function of the HTML tags is to format and organize the text of an HTML document. Another function of the HTML tags is to denote items within the HTML document. For example, a particular item called a hyperlink provides a link to another document and is denoted by a particular HTML tag. Hyperlinks can be thought of as cross references connecting HTML documents to facilitate traversal from one HTML document to another. As a result, a certain collection of interconnected HTML documents available on the Internet is often referred to as the World Wide Web (WWW). Although widely used on the Internet, HTML can also be used on non-networked machines or on intranets (networks separate from the Internet using Internet standards).
The text in an unrendered HTML document is sometimes referred to as xe2x80x9csourcexe2x80x9d or xe2x80x9crawxe2x80x9d HTML. Ordinarily, such source HTML is not viewed without the aid of an HTML document viewer (commonly called a web browser). The HTML document viewer interprets the unrendered HTML text to provide the user with a rendered HTML document. For example the xe2x80x9c less than B greater than xe2x80x9d HTML tag causes text associated with it to appear in boldface type, and the xe2x80x9c less than HR greater than xe2x80x9d HTML tag results in a horizontal rule. HTML includes numerous other tags to represent headings, numbered lists, and other items. Commercially available HTML document viewers include the Internet Explorer by Microsoft Corporation of Redmond, Wash., Netscape Navigator by Netscape Communications Corporation of Mountain View, Calif., and Mosaic by the National Center for Supercomputer Applications (NCSA) of Champaign-Urbana, Ill.
Originally, HTML consisted of only text and text-related tags. The limitations of a text only system were recognized, and new HTML tags were added to define images, forms, tables, and other items. As other limitations in existing tags are recognized, HTML is extended and enhanced. For example, new HTML tags are added, or existing tags are redefined. As a result, the definitions of HTML tags may change over time, and the rendering process evolves to accommodate the changes.
One or more HTML tags can be used to construct an HTML element. Certain HTML elements relating to forms are particularly useful because they can be used to collect information from users. For example, when the HTML document is rendered, an HTML element may appear as a text box or a graphical push button. A user can place identifying or search information in the text box and click on the push button (i.e., activate a user input device such as a mouse while pointing at the graphical push button) to send the information to a vendor. Such an arrangement has numerous uses, including conducting business over the Internet (e.g., using an HTML order form document), performing operations on non-networked computers (e.g., using an HTML form to search a database of personal contact information), or gathering information over an intranet (e.g., using an HTML form to collect information from a new employee).
A recurring problem associated with HTML is finding a convenient way to create and edit HTML documents. Many of the more powerful document editors (e.g., word processor, spreadsheet, or presentation programs) include a large set of useful features for editing and publishing a variety of documents, such as WYSIWYG (What-You-See-Is-What-You-Get) editing, spell checking, and extensive on-line help. Document editors such as Microsoft Word 95 and Microsoft Excel 95 by Microsoft Corporation of Redmond, Wash., offer these and many other useful features.
In addition, many document editors allow certain items to be embedded into documents. For instance, a word processor may permit an image or a video to be inserted into a text document. A certain item called an object may be embedded in a document if it is of a format recognized by the document editor as embeddable. Examples of embeddable objects are text boxes, pick lists, and graphical push buttons. A class identifier is used to identify like objects of a particular class.
Once embedded in the document, some document editors allow the user to manipulate the object within the document. For example, text boxes, pick lists, and graphical push buttons may be placed within a document to create a form. A user may define the appearance and behavior of the embedded objects by setting their properties. For example, Microsoft Forms3 by Microsoft Corporation provides the user with classes of objects relating to text boxes, push buttons, and other items that can be embedded in Microsoft Word 95 documents. Thus, the user may employ these document editors with embeddable object capability to create and edit forms.
However, using these editors to edit HTML documents presents two basic problems. First, the format these editors use to load or save documents (hereafter their xe2x80x9cnativexe2x80x9d format) is typically not HTML, so the editors do not offer a way to create or edit rendered HTML documents. In other words, a document saved in the native format of one of these editors is typically not saved according to the HTML specification.
The second problem relates to embedding HTML into a document. Although some document editors allow items of various formats to be embedded into documents, HTML is generally not recognized as an embeddable format. In other words, document editors generally do not permit HTML elements to be embedded into a document.
Some document editors address the first problem by offering a mechanism for converting between the editor""s native format and HTML, called a converter. However, these converters typically convert only a subset of the possible HTML tags and elements, such as simple ones relating to formatting and hyperlinks. In this way, the user can take advantage of the features within the document editor to edit some HTML documents and need not use a separate application to edit HTML documents. However, more complex HTML elements, such as those relating to forms cannot be converted to simple formatting or a hyperlink. Further, the second problem relating to the non-embeddable nature of HTML prevents such a simple converter from simply placing these HTML elements into a document.
To solve the conversion problem, a comprehensive program could be constructed to convert between an HTML document and a document in a particular native document format. For example, the Internet Assistant by Microsoft Corporation provides users with an HTML conversion software routine for Microsoft Word 95. A problem facing these programs is that the HTML-to-native document format converter must be able to convert a wide variety of HTML elements into the native document format. In addition, each of the HTML tag components may affect the conversion process. Further, the native document format may not include features that fully represent the HTML elements. Therefore, additional data must be associated with the native document features and additional logic must be included in the document editor to handle the additional data. The logic required to implement such a process can be relatively complex. Finally, if the HTML elements cannot be fully represented by native document features, the document editor may not provide WYSIWYG editing of the converted document.
The native document format-to-HTML converter faces similar problems because it requires logic to convert the native document format into a wide variety of HTML elements. Thus, a comprehensive converter becomes quite large and complex. In addition, when HTML is converted into a native document format and subsequently converted back into HTML, the resulting HTML may not be identical to the original HTML, due to idiosyncrasies of the conversion process. Finally, as HTML evolves, such a converter falls quickly into obsolescence, and the additional logic included in the document editor must be changed.
Several document editors bypass the problems associated with conversion and use HTML as their native format, allowing users to directly edit rendered HTML. Examples of these HTML editors include FrontPage by Microsoft Corporation, Netscape Navigator Gold by Netscape Communications Corporation, and HoTMetaL by SoftQuad Incorporated of Toronto, Canada. However, these specialized programs have their own drawbacks. They contain a reduced set of features compared to other document editors, and a new version of the editor must be provided when new HTML extensions or enhancements are added.
Another way to address the problems associated with conversion is to edit the source HTML without rendering it. A document editor may offer a way to manipulate plain text. If so, the editor may be used to open, save, edit, or create source HTML without rendering it. However, such an approach is problematic because the user cannot see how the HTML viewer will ultimately render the HTML. For example, the user may see the unrendered HTML element for a graphical push button (e.g.,  less than INPUT TYPE=xe2x80x9cresetxe2x80x9d greater than ) but cannot see where the push button will be placed or how it will appear when rendered. As a result, the user cannot perform WYSIWYG editing, and the editing process is often slowed by repeated switching between editing the source HTML with a document editor and viewing the rendered HTML with an HTML viewer. Forms are challenging to edit in any environment, and the inability to easily see the form compounds the difficulty.
Thus, users who wish to create or edit HTML documents containing certain HTML elements, such as those relating to forms, have been forced to choose among three undesirable situations: writing a comprehensive converter program with its associated problems, using a specialized HTML editor with limited features, or editing source HTML by switching back and forth between programs.
The present invention provides methods and systems for converting between a structured language document and a document of a native format while avoiding these problems. Conversion between structured language elements and objects embeddable in the native format of a document editor is provided. Thus the user is not required to use a separate, specialized program and can take advantage of the features in a document editor that does not use the structured language as its native format (e.g., a non-HTML document editor) and does not recognize the structured language as an embeddable format.
In one implementation, a method is provided for converting a structured language element to an object of an embeddable format recognized by the document editor. A converter first converts a structured language document to a stream in a format called Rich Text Format (hereinafter xe2x80x9cRTFxe2x80x9d). When a structured language element is encountered in the structured language document, a class identifier identifying an object of a format embeddable in the document editor is selected according to a selected tag component (e.g., an HTML tag name) of the structured language element with reference to a Structured Language Element-to-Embeddable Object Class Association Table. Then, the class identifier and the structured language element are placed into a storage. The storage is converted into a stream, and the stream is placed into the RTF stream. The converter does not instantiate an embeddable object during the conversion process or set the embeddable object""s properties. In this way, the converter logic is kept small and need not be repeatedly updated with each change in the structured language. If new structured elements are created, a new Structured Language Element-to-Embeddable Object Class Association Table can be provided instead of providing a new converter. Another advantage to this arrangement is that some document editors already have a facility for converting RTF to their native format.
In another aspect of the invention, the streams relating to embeddable objects in the RTF stream described above are converted into storages, each storage containing a class identifier and a structured language stream. When the embeddable object relating to the storage is to be displayed, the storage is used to instantiate an embeddable object of the class identified by the class identifier in the storage. The structured language stream is passed to the embeddable object using an interface to the object, and the structured language stream is stored within the embeddable object. An advantage to this arrangement is that a document editor may already have a facility for determining when an item is to be displayed and can already instantiate an object if provided an appropriate storage. Thus, this aspect of the invention has the advantage of providing an appropriate storage for instantiating the embeddable object without instantiating an object to produce the storage.
In another aspect of the invention, the structured language stream within the embeddable object is used to set the properties of the embeddable object. A property setting software routine is included in the document editor. The embeddable object provides the structured language stream and a Structured Language Attribute-to-Embeddable Object Property Association Table to the property setting software routine, which parses the structured language and sets the appropriate embeddable object properties according to the structured language stream. In this way, the embeddable object avoids containing certain logic for parsing the structured language because the property setting software routine handles the details of parsing the structured language with reference to the Structured Language Attribute-to-Embeddable Object Property Association Table. In this way, the size of the embedded object is reduced, and structured language extensions or enhancements can be processed by providing a new Table.
In another aspect of the invention, the property setting software routine stores a xe2x80x9cfoundxe2x80x9d value within the embeddable object indicating which of the properties were set with reference to the Structured Language Attribute-to-Embeddable Object Property Association Table. If the property setting software routine encounters a portion of the structured language stream that it is unable to process, it stores the portion separately within the embeddable object. The information relating to which of the properties were set and which portions of the structured language could not be processed is used if the embeddable object is later converted back into the structured language. In this way, idiosyncrasies of the conversion process are avoided, and the resulting structured language more closely resembles the original structured language, even if not recognized during the property setting process.
In the case of an HTML document, since the HTML elements (e.g., such as those relating to forms) in the HTML document are converted to objects embeddable by a non-HTML document editor, the user can use the numerous features available within the non-HTML document editor. Since the embeddable object handles some of the details of the HTML to embeddable object conversion, the same conversion logic may be used to process HTML streams containing new extensions or enhancements. If the details of the conversion process change due to new HTML extensions or enhancements, changes to the conversion process can be effected by providing a new embeddable object class (which may contain a new Structured Language Attribute-to-Embeddable Object Property Association Table). A mechanism for providing a new embeddable object class already exists, so the invention facilitates converting new HTML extensions or enhancements.
Other aspects of the invention relate to converting an embeddable object into a related structured language element. The embeddable object provides a stream of the structured language through an interface. If no changes have been made to the embeddable object since the structured language stream was stored in it, the embeddable object uses the stored structured language stream. Using the stored stream facilitates the conversion process and results in a structured language stream closely resembling (or identical to) the original.
If there have been changes to the embeddable object, or if no structured language stream is present within the embeddable object, a property saving software routine is provided by the document editor to assist the embeddable object in producing a structured language stream. The property saving software routine uses the Structured Language Attribute-to-Embeddable Object Property Association Table and the xe2x80x9cfoundxe2x80x9d value described above to generate a stream of the structured language relating to the embeddable object. The property saving software routine also retrieves any portion of structured language not recognized that was stored within the embeddable object. Thus, the embeddable object can use the property saving software routine to assist it in generating a stream of the structured language. In this way, the embeddable object avoids containing certain logic for generating a stream of the structured language. As a result, the size of the embeddable object is reduced.
In addition, using the xe2x80x9cfoundxe2x80x9d value and the portion of structured language not recognized results in a structured language element more closely resembling (or identical to) the original one.
Additional features and advantages of the invention will be made apparent from the following detailed description of an illustrated embodiment which proceeds with reference to the accompanying drawings.