This invention relates in general to managing an embedded file within an electronic document, and more specifically relates to simulating the characteristics of a file embedded within a primary file in response to saving the primary file in an Hyper Text Mark-up Language (HTML) format.
Users have clear expectations of how embedded content management should work. These expectations have been established from years of using traditional desktop productivity tools, such as word processing programs, which typically enable both embedding content in a primary document and editing the embedded content. In contrast, for Hyper Text Mark-up Language (HTML)-formatted documents, such as web pages, each piece of content is required to be a separate linked file. In other words, HTML does not directly support the concept of embedding content in the primary document. Nevertheless, the expectations of users have not changed in this HTML-formatted document environment because they still desire HTML documents to support the characteristics of embedded content.
Referring to FIG. 1, when a user saves an electronic document as a typical word processing file, such as a Microsoft xe2x80x9cWORD 97xe2x80x9d program file shown in a display 100, both a sunburst image 102 and a background image 104 are physically contained in the file as xe2x80x9cembeddedxe2x80x9d files. In contrast, a linked logo 106 and a hyperlink to another web page 108 remains outside of the file as xe2x80x9clinkedxe2x80x9d items. Users experienced with traditional desktop productivity applications have certain expectations in the characteristics exhibited by embedded content within an electronic document, such as the content presented by the display 100. For example, users typically expect the following representative results, shown in Table I, in response to manipulating a electronic file containing an embedded file or operating directly upon an embedded file.
However, when the electronic document of FIG. 1 is saved as an HTML-formatted web page, the sunburst image 102 and the background image 104 can not be physically embedded within the electronic document because of the inherent limitations of the HTML file format. Although the user may believe that the sunburst and the background images 102 and 104 are embedded images, the act of saving the document as an HTML file results in linking these images as separate files to the document. Consequently, prior HTML-compatible editors fail to satisfy the above-referenced expectations of typical users for the performance of embedded files in electronic documents. By linking files, rather than physically embedding files, as a consequence of the HTML format, a user""s editing operations may result in the undesirable problem of multiple xe2x80x9corphanedxe2x80x9d files that waste disk space and cause general user confusion.
Although the prior art has attempted to solve the problem of managing embedded content in several different ways, each prior solution suffers from key limitations. One prior solution is to present a dialog in response to conducting an HTML save operation, thereby prompting users to select the names and storage locations of each embedded piece of content, while internally converting this content to linked content. For the example of a web page xe2x80x9cWeb Page.htmxe2x80x9d having three different pasted pictures, upon initiating a save operation, the user is typically presented by this prior solution with a dialog prompting the user to select file names for the pictures and storage locations.
This prior solution fails to satisfy user expectations regarding the behavior of the pasted pictures because, after the first save within the HTML format, the pasted pictures become separate linked files. For example, deleting a link does not result in the removal of the linked content from the file system. A change to the linked content in one copy of a document can result in the unintended change of this linked content in other copies of the document. In contrast to a save operation of an electronic document having embedded content, saving a copy of a document with linked content does not result in saving a copy of the linked content. Likewise, saving a document over an existing document does not result in the deletion of linked content in the existing document. Adding new linked content to a document can result in an unexpectedly overwrite of existing content in the document. Also, this prior solution typically handles only embedded images and fails to support other varieties of embedded content, such as embedded stylesheets, embedded web pages, embedded framesets, etc.
A second prior solution supports the automated selection of file names and locations of each embedded piece of content for a primary file, but again internally converts each content piece to linked content. For example, if a primary file containing three embedded pictures is saved in HTML format as xe2x80x9cWeb Page.htmxe2x80x9d, this prior solution can automatically select files names, such as Image1.gif, Image2.gif, Image3.gif, for the three pictures. Links are created for these images, which are stored as separate files on a storage mechanism, such as a hard disk drive. While this solution does not rely upon a dialog to prompt a user to select file names or storage locations in response to saving the primary file, the linked content fails to provide the expected behaviors for the original embedded content.
A third prior solution operates to save all content in an HTML-formatted document, both linked and embedded, in a special single file containing embedded files. Although this single file solution addresses some of the desired behaviors expected by users of embedded content, this solution also introduces unacceptable limitations because all content in the document is now treated as embedded content, even linked content. In other words, this single file solution satisfies selected user expectations for embedded content but violates all expectations for linked content. In addition, the single file is typically not formatted as an HTML document. This means that the file is not directly readable by browsers or editable by existing web page editors. Moreover, the single file is typically slower to save and slower to load than a similar HTML-formatted file, because of the inherent disadvantage of loading a large single file rather than progressively loading multiple files over a network connection.
In view of the foregoing, there is a need to fulfill users"" expectations of how embedded content should work while also using HTML as the file format. The present invention solves this embedded content management problem for HTML-formatted files by placing information in a primary file that provides a cue to an editing program, such as a web page editor, that a particular file associated with that primary file should be treated as either embedded or linked content.
Although HyperText Markup Language (xe2x80x9cHTMLxe2x80x9d) files contain links to electronic files, rather than embedded files, the present invention can simulate the characteristics exhibited by an electronic document having one or more embedded files. For example, users of typical desktop productivity tools, such as word processing or spreadsheet programs, have the expectation that opening an electronic file containing an embedded file will result in the opening of that embedded file. For a corresponding HTML-formatted file, which can not contain an embedded file because of the inherent limitations of the HTML file format, the present invention achieves this desirable characteristic by saving a primary file having an embedded file to a storage mechanism, such as a hard disk drive, and saving each embedded file as a linked support file in a known location on the storage mechanism. In response to initiating a save operation for this primary file, a xe2x80x9cfile listxe2x80x9d is created that references the primary file and each support file representing embedded content for the primary file. This file list is typically identified by an HTML tag placed in the header of the primary file. When the HTML file is opened during the next working session, an editor program module open a link to each support file identified by the file list by use of traditional HTML mechanisms. In this manner, the opening of the HTML-formatted primary document also results in the opening of each linked support file, thereby presenting the user with the impression that files embedded within the primary file have been opened.
Users of traditional desktop productivity tools also have the expectation that deleting an embedded file from a non-HTML document during edit operations will also result in the deletion of the embedded file from the storage mechanism. To achieve this desirable characteristic in a corresponding HTML-formatted file, the present invention can conduct an inquiry at save time to determine whether a prior file list is available for the primary file. This prior file list, which can be created during a previous save operation for the primary file, contains entries that identify each support file associated with the primary file at the time of the prior save operation. If this prior file list is available, the identifiers for the support files in the prior file list are compared to identifiers for any support files created during the current save operation. Support files may be created at the time of the current save operation if corresponding embedded content remain after edit operations on the primary file. Any support files that are identified by entries in the prior version of the file list, but not identified by entries in the current file list of support files, are deleted from the hard disk. In this manner, any support file saved during the previous save operation, but deleted by the user during current edit operations, will be deleted from the hard disk during the current save event. In other words, when the HTML-formatted primary file is saved again, the editor program module does not attempt to save the support file corresponding to an embedded file deleted by the user because this support file is no longer referenced by the primary file. Consequently, the present invention can complete a clean-up operation to delete from the hard disk files which were embedded in a prior version of the primary file, but are no longer referred by that primary file.
More particularly, the present invention is a computer-implemented process for simulating, in an HTML-formatted primary file, characteristics of an electronic document containing an embedded file. The process can be initiated in response to conducting a save operation for an electronic document intended to be saved as an HTML document. Continuing with the save of the document as an HTML file, the embedded file is written to a hard disk of the computer as a support file and automatically assigned a unique identifier, such as a file name, and a storage configuration. An automated naming system can be used to assign unique identifiers and storage configurations to embedded files in response to saving a primary file containing the embedded files, thereby avoiding possible file name collisions. A new file list is created and saved to disk in association with the support file. This new file list may reference both itself and the support file, and typically includes the unique identifier assigned to the support file. The file list, typically an XML file, can be used to track which content is embedded and which content is linked in a primary file saved in HTML format. This file list supports an automated process for cleaning-up embedded content that has been removed from the HTML-formatted primary file as a result of edit operations.
An inquiry is conducted to determine if a prior file list is associated with the primary document. If a prior file list is located by this search, then the new file list is compared to the prior file list to determine whether support files identified by the prior file list are not identified by the new file list. The support files not identified in the current file list, but identified in the prior file list, can be deleted from the hard disk because the corresponding embedded files have been deleted from the primary file during current edit operations.
The present invention offers advantages over the prior art for managing embedded content in an HTML-file environment. If the user deletes apparent embedded content in the authoring environment, the corresponding supporting file is also deleted from the storage mechanism. If the user changes apparent content in one copy of a document, this embedded file will not change in other copies of the document because a separate copy of this file, i.e., the corresponding supporting file, is maintained in a known storage location for each document. If the user saves a copy of a document with embedded content, a supporting file corresponding to the embedded file is created and maintained on the storage mechanism for future reference in connection with edit operations of the document copy. If the user saves a document over an existing document, the apparent embedded content of the existing document is cleaned-up by deleting the supporting files corresponding to that embedded content. Adding new embedded content to a document does not result in an overwrite of existing content, either in that document or in any other document. The present invention also can process types of embedded content other than images, while correctly handling linked content and using standard HTML that is readable by browsers and web page editors.
The various aspects of the present invention may be more clearly understood and appreciated from a review of the following detailed description of the disclosed embodiments and by reference to the appended drawings and claims.