1. Field of the Invention
The present invention relates to electronic publishing systems and, more specifically, to an authoring system for creating structured documents in an on-line publishing system.
2. Description of the Related Technology
Many different systems exist for publishing documents on a computer system. These systems are used to, for example, create newsletters or brochures to promote a particular company. In addition, publications can be used to disseminate information to a variety of customers. A number of programs exist for allowing a user to design complicated layouts for a particular application. Well-known programs such as Microsoft Publisher(copyright), Ventura Publisher(copyright), PageMaker(copyright), and PrintShop(copyright) help a user to produce attractive newsletters and brochures.
These publication systems let the user define particular regions of every page for a specific purpose. For example, the user can place a graphic frame that runs along the top of the page to hold a particular image. Such an image may include the title of the newsletter or another related aspect of the newsletter. In a similar way, the user may define other areas of the first page to include one or more text frames for holding text-based information such as the words from particular story. The user designs the text frame to have certain properties, such as height, width, background color, foreground color and other such properties so that the text becomes attractively formatted for the customer. In addition, the user can format the text information within the text frame to have desired font and paragraph characteristics. For example, the user can highlight the characters within the text frame and define that font to be, for example, bold-faced. The user can also choose to only apply a character format to specific words or paragraphs within a text frame.
Some of these publication programs use a Microsoft Object Linking and Embedding (OLE) architecture to store their documents. A major feature of OLE is interoperability, the basis for integration between applications. This integration brings with it the need to have multiple applications write information to the same file on the underlying file system. OLE defines a model called OLE Structured Storage for treating a single file system entity as a structured collection of two types of objects; storages and streams. These objects act like directories and files, respectively. The OLE Structured Storage model generally implements these objects; applications rarely, if ever, need to implement them. These objects, like all others in OLE, implement interfaces: IStream for stream objects, IStorage for storage objects.
A stream object is the conceptual equivalent of a single disk file. Streams are the basic file system component in which data lives; each stream has access rights and a single seek pointer. Through its IStream interface, a stream can be told to read, write, seek, and perform a few other operations on its underlying data. Streams are named by using a text string; they can contain any internal structure because they are simply a flat stream of bytes. In addition, the functions in the IStream interface map nearly one-to-one with standard file-handle-based functions such as those in the ANSI C/C++ run-time library.
A storage object is the conceptual equivalent of a directory. Each storage, like a directory, can contain any number of substorages (subdirectories) and any number of streams (files). Furthermore, each storage has its own access rights. The IStorage interface describes the capabilities of a storage object, such as enumerate elements (dir), move, copy, rename, create, and destroy. A storage object itself cannot store application-defined data except that it implicitly stores the names of the elements (storages and streams) contained within it.
The OLE Structured Storage technology solves problems associated with previous flat file systems through the extra level of indirection of a file system within a file. With OLE, a particular application can create a structured hierarchy where the root file itself has many substorages. Each substorage can have substorages within it, and so on.
This structure solves the problem of expanding information in one of the objects: The object itself expands the streams in its control, and the implementation of storage determines where to store all the information in the stream.
In this sort of storage scheme, the objects that manage the content always have direct incremental access to their piece of storage. That is, when the object needs to store its data, it writes it directly into its subfiles without having to involve the main application. The object can, if it wants to, write incremental changes to that storage, thus leading to much better performance.
If the user wants to make changes to that information later on, the object can then incrementally read as little information as necessary instead of requiring the application to read all the information into memory first. Incremental access, a feature that has traditionally been very hard to implement in applications, is now the default mode of operation.
Other categories of publication systems include software for electronically publishing stories across on-line networks such as CompuServe, America On-Line, or the Internet. Most of these systems create and display stories that are formatted in a Standard Generalized Markup Language (SGML) or Hypertext Markup Language (HTML). Both the HTML and SGML are standards for tagging text in documents to be displayed in an on-line network. Documents that are formatted in HTML or SGML can be viewed by several widely distributed browsers such as Mosaic and NetScape for the Internet. These browser programs read SGML and HTML tagged documents and display them with proper formatting.
Several programs exist for producing documents that are tagged in either the SGML and HTML format. Programs such as Interleaf""s WorldView 2 allow a user to create an SGML document with, for instance, bold-face text and hyperlinks to other documents. Once a document has been saved in an SGML format, it can be read by either the Mosaic or NetScape browser. Unfortunately, all of the formatting commands for text or graphics in an SGML or HTML document are embedded within the document. The Mosaic or NetScape browsers do not reformat these tagged documents, but rather only display the commands embedded in the SGML or HTML documents to a user. For this reason, the designers that produce the SGML and HTML documents must add formatting commands to every new document. In addition, there is little flexibility to change the document""s formatting once the tagged document has been produced. Therefore, the process of creating documents for display using SGML or HTML is very inefficient for the document designer.
Other commercially available software programs for producing on-line publications are available in the marketplace. One type of electronic publisher that generates its own specific format of text while retaining the specific layout of the document is the Adobe Acrobat(trademark) software package. Acrobat(trademark) reads and stores documents in a specialized format known as the Portable Document Format (PDF) for use on the Internet. Other electronic publishing programs are produced by Interleaf, Inc. (Waltham, Mass.), Farallon Computing (Alameda, Calif.) and Common Ground Software (Belmont, Calif.).
In addition, a converter has been written by Charlesview (Boston, Mass.) to convert Microsoft Word(copyright) documents into HTML text. This converter works by mapping Word styles to HTML tags, and then produces a text document. However, since these documents are converted into a text form so they can be read by well known browsers, they do not include embedded objects. In addition, HTML text documents do not have any associated keywords which would allow them to be found quickly across a large on-line system.
Another on-line information system is described in U.S. Pat. No. 5,347,632 by Filepp et al. This patent discusses an interactive computer system network which enables a user to display news information and perform transactional services through a personal computer. However, in the Filepp system the news information is integrated into display regions.
The invention described in U.S. Pat. No. 5,347,632 includes procedures for formulating objects that have been specially structured to include display data, control data and program instructions. Unfortunately, this system does not provide a separation of the content being displayed from the design.
Therefore a need exists for an on-line system which provides separation of design from content. Moreover, a need exists for an authoring system to be used in an on-line network to provide content providers with increased flexibility for presenting their content to customers.
The present invention relates to a new authoring system for creating on-line stories. The preferred embodiment of the environment uses an enhanced version of Microsoft Word(copyright) to create Multimedia Document Files (MDF). These multimedia files are then used to provide content for displayed on-line titles as discussed below for a Multimedia Publishing System (MPS).
The enhanced Microsoft Word(copyright) includes a pair of converters to translate the Rich Text Format (RTF) input/output of Word(copyright) to a Multimedia Document File format. In addition, a Word template is included to help the author produce documents with valid embedded codes. A hypertext link embedding tool and a property editor for assigning find properties to the document are also included. These will be discussed below in more detail.
One object in the MDF file storage holds text of the story that is tagged in a newly designed markup language termed herein as the Multimedia Publishing Markup Language (MPML). MPML is a version of the HTML 2.0 with additional extensions for supporting more detailed tagging of structure as well as embedded OLE objects.
In addition to adding MDF content to a project by authoring in Word(copyright), the present invention also includes programs for converting existing HTML documents to a MPML when added to a project. These concepts will be explained in more detail below.
One embodiment of the present invention is a method of publishing structured documents in a computer network comprising publisher, server and customer computers comprising creating tagged content, storing a plurality of tagged objects representative of the tagged content in a document in the publisher computer, adding at least one non-tagged object to the document, transferring the document to the server computer, and receiving, at the customer computer, from the server computer the non-tagged objects of the document independent of the tagged objects.
Another aspect of the present invention is a method of publishing structured documents in an electronic publication system, comprising inserting a plurality of text portions indicative of a story object into a document, tagging each text portion of the story object with a tag, inserting an embedded object into the story object, storing the tagged text portions into a first object storage of the story object, storing the embedded object into a second object storage of the story object, and displaying selected ones of the text portions and the embedded object, the selection dependent upon the tags.
Yet another aspect of the present invention is a structured document in an electronic publication system, comprising a storage container having a root storage, a find properties object stream referenced by the root storage, a markup language object storage referenced by the root storage container, and an embedded object storage referenced by the root storage container.
Still another aspect of the present invention is a method for efficiently transmitting tagged content to a computer in an on-line publishing system, comprising creating a tagged document on a host computer, parsing the tagged document into a parse tree comprising a plurality of objects, and transmitting the objects to a second computer.