1. Field of Invention
The invention relates generally to computer systems. More particularly, methods and apparatus for user controlled conversion of a document in a computer based system.
2. Description of Relevant Art
In the broadest sense, a document is a form of information that can put into an electronic form and stored in a computer as one or more files. Often a single document becomes a single file, whereas an entire document or individual parts may be treated as individual data items. Recent approaches for storing and manipulating computer stored documents utilize a tree structure to organize the various individual data items. One such approach is referred to as the Document Object Model (DOM). The Document Object Model is a programming API for Hypertext Markup Language (HTML) and Extensible Markup Language (XML) documents that defines the logical structure of documents and the way a document is accessed and manipulated. In the DOM specification, the term “document” is used in the broad sense-increasingly, XML is being used as a way of representing many different kinds of information that may be stored in diverse systems, and much of this would traditionally be seen as data rather than as documents. Nevertheless, XML presents this data as documents and the DOM may be used to manage this data. With the Document Object Model, programmers can create and build documents, navigate their structure, and add, modify, or delete elements and content such that anything found in an HTML or XML document can be accessed, changed, deleted, or added using the Document Object Model.
It may be necessary on occasion to convert a document from one format to another such as, for example, converting a spreadsheet based document into a text based document, or vice-versa. Unfortunately since most document formats are substantially different from one another, conventional conversion processes are generally “lossy” in that valuable information is lost in the conversion process or the documents are restructured in such a manner as to lose information. An example of such a lossy conversion is when a text based document having internal structure such as headers, footers, embedded figures, etc. is converted to a GIF (or any raster based document) which has no internal document structure since all headers, footers, embedded figures are “logically” the same. In this case, it would not be possible to “edit” any of the text in the converted document since that information referred to as “text” in the original document has been lost in the conversion process.
In addition to the lossy nature of conventional document converters, conventional converters are generally atomic in nature in that the conversion process is indivisible affording no opportunity for a user to affect the conversion process or the eventual structure or organization of the converted document.
Therefore, in view of the foregoing, it would be advantageous and therefore desirable to have a non-atomic document converter that affords a user the ability to control the structure in the converted document.