Computer based document handling systems are generally divided into four broad categories: text editors and word processing systems; formatters; syntax directed editors; and specialized tools. Most systems have features from more than one of these four broad categories. For documents in which presentational considerations are important, the documents must be submitted to a formatter for preparation prior to presentation.
Formatters are non-interactive tools that process a document to produce either a display independent or a device dependent layout specification. Documents are submitted to formatters in the form of descriptions on file and they carry out the processing and return the overall results after a certain period of time. High-level formatters work on the basis of a logical description of the document. The user is not required to specify the presentation details desired. The user deals with the logical organization of the document, i.e., the different types of elements that appear in the document, such as, for example, section, paragraph, heading, summary, etc. The formatter handles the layout presentation of these elements. Low-level formatters make it possible to include commands within the document description to enable changes in other characteristics of the document, such as, for example, font, spacing, margins and justification. The present invention is directed primarily to high-level formatters.
Most interactive systems allow the user to see the layout of a document as it is being prepared. These interactive systems also separate the logical structure of the documents from the specification of the presentation details. Typically, interactive systems as well as high-level formatters, use a grammatical notation to describe the logical structure of documents. These logical structures are mostly hierarchial in nature and tree structures are used to represent them. FIG. 1 is an illustrative example of a simple tree structure. The structure of a scientific article, for example, could be represented as a following set of grammar rules, or productions:
______________________________________ Article .fwdarw. HeaderBody Header .fwdarw. TitleAuthors Title .fwdarw. CHAR* Authors .fwdarw. CHAR* Body .fwdarw. Section* Section .fwdarw. SectionTitle Paragraph* Section Title .fwdarw. STRING CHAR* Paragraph .fwdarw. Entity* Entity .fwdarw. Text.vertline.TABLEEntity.vertline.List Text .fwdarw. CHAR* TableEntity .fwdarw. TABLE Caption Caption .fwdarw. STRING CHAR* List .fwdarw. Item* Item .fwdarw. ItemMark Paragraph* Item Mark .fwdarw. STRING ______________________________________
In the above set of productions, the words appearing in upper case, such as, for example, CHAR, TABLE and STRING are terminal symbols. Terminal symbols have no further internal structure. The remaining symbols are non-terminal symbols. Symbols are also referred to as element types. A production specifies the structure of a non-terminal symbol on the left hand side. For example, according to the first production, an article is made up from a header followed by a body. The operator "*" denotes zero or more occurrences of the symbol preceding it. Thus, the non-terminal title is made up of zero or more occurrences of the terminal symbol CHAR. The operator ".vertline." denotes alternative representations. Thus, an entity, according to the above production, is either Text, a TableEntity or a List. Some systems also provide the ability to attach attributes to element types. Thus a section, for example, could have attached to it an attribute called language with values from the set of possible languages that a particular section may be in.
Most systems also provide a facility to describe logical structures of documents in the above described manner. A particular document would then correspond to a hierarchial structure, such as, for example, a tree structure, conforming to the productions describing the relationships between the different elements. Referring to FIG. 1, a generic tree structure is shown having a root node, internal nodes and children (also known as terminals). The root node is the base node of the tree. It is the parent of all subsequent nodes shown in FIG. 1 as internal nodes. The children, or terminals, are the lowest elemental units of the tree and are descendants of internal nodes. Using such a hierarchial tree structure, the grammar described in a logical structure for a class of documents is called a generic logical structure. A document instance, i.e., a tree conforming to the grammar, is then referred to as a specific logical structure and is a description of an instance of the class of documents. For example, a particular article is an instance of the class of all articles. A different grammar from the one described above describing the structure of a form, would be another example of a generic logical structure with a particular form corresponding to a specific logical structure.
The image presented on the screen during editing--as well as the image that is printed at the end of processing in batch-oriented systems--is built up automatically from the specific logical structure of the document. Typically, this is based on presentation rules specifying the way in which each element type in a generic document structure is to be displayed or printed. The manner in which presentation rules are specified varies from one system to another. Typically, known systems use the notion of a property sheet or attribute table attached to each document element type. The attribute table contains, for each type of element defined in the generic structure of the class, a set of characteristic formatting attributes of each element type. Using the article example shown above with respect to logical structures, a possible attribute table for some of the element types is set forth below. A question mark is used for attributes whose value can be changed by the user with the value following the question mark being the default value.
______________________________________ Article: FontFamily = ? (Times) MainTextFontShape = ? (Roman) MainTextFontSize = ? (10) MainTextLineSpacing = ? (13) MainHeadingFontShape = ? (Bold) HeadingFontShape = ? (Italic) TextAreaWidth = ? (312) TextAreaHeight = ? (528) AbsoluteTopMargin = ? (100) LeftMargin = ? (100) RightMargin = LeftMargin + TextAreaWidth Language = English PAGINATE (AbsoluteTopMargin, TextAreaHeight) Paragraph: ParagraphNumber = COUNTIN (Section) Indentation = IF ParagraphNumber = 0 THEN 0 ELSE em(FontSize) JUSTIFY (LeftMargin, RightMargin, Indentation, Formatting, Mode, LineSpacing, Language) ______________________________________
The above example shows an attribute table with the values of the presentation attributes for two logical document element types, article and paragraph. There are two procedure calls in the attribute table above. These procedure calls are shown as PAGINATE, at the end of the table for ARTICLE, and JUSTIFY at the end of the table for paragraph. These procedures break articles into pages, and paragraphs into lines, respectively. The parameters of these procedures (which can be set in the attribute table) determine the results of these procedures. For example, they determine the image of the document, how it is laid out in pages, and how the paragraphs are broken into lines.
There are many disadvantages to using the above described prior art approach. The formatting procedures are buried in the implementation of the system and are inaccessible to the user. Any change to these procedures requires a detailed knowledge of the internal data structures and other mechanisms of the system. The only things the user has control over are the attributes that act as parameters to these procedures. It is also difficult to predict the results of formatting by changing the values of one or more of these parameters without being able to see the result. Furthermore, in most systems, the attribute tables for the presentation rules form a part of the logical structure described in the document. This has the disadvantage that it mixes processing information with logical structure information.