1. Field of the Invention
The present invention relates to automatically hyperlinking multimedia product documents and more particularly to a generalized hyperlinking system to create hyperlinks interactively, one at a time, or automatically, in mass production, statically, at authoring time, or dynamically, at browsing time.
2. Description of the Prior Art
Due to the growing popularity of the World Wide Web in recent years, hyperlinking has become a viable mechanism to access information in many applications on local machines, an intranet, and the Internet. In manufacturing, product documents are now being provided in electronic form on CDs and over networks to the engineers and customers. Typically, product documents in different media (including text, images, schematic diagrams, CAD drawings, audio, video, etc.) are identified and linked together manually, so that when browsing one document, related documents can be easily retrieved. Since textual documents can usually be represented (or exported) in the ASCII format, further processing can be performed manually or automatically to identify words and phrases to be linked to other textual and non-textual documents. In advanced multimedia documentation systems, non-textual documents are also processed with graphics and imaging algorithms to extract useful information which can also be represented in some ASCII form to facilitate further processing such as querying and hyperlinking.
In order to fully utilize technical contents to support various product-related activities and improve interchangeability of product information among vendors, standard markup languages such as HTML, SGML and XML are being used to represent product documents in all media. HTML is described in HTML 4.0 Reference Specification, W3C Recommendation, April 1998. SGML is described in SGML, ISO 8879:1986 Text and Office Systems-Standard Generalized Markup Language, Geneva, 1986. XML is described in XML 1.0 Reference Specification, W3C Recommendation, February 1998. SGML-based product documents are well-structured for specific application domains such that technical contents can be more precisely identified and related to one another. In general, an SGML document is a hierarchical structure of document objects of all types of media. A document hyperlink is a relationship between two document objects within a document or across two different documents. Some examples of hyperlinks within and across documents are shown in FIG. 1.
A complete hyperlinking process involves three major tasks: link authoring, link management, and link browsing. Link authoring refers to the task of recognizing the relationships between two document objects and generating the link information. Link management refers to the task of storing link information to support link authoring, link browsing and other applications. Link browsing refers to the task of activating links to retrieve the needed information. These three tasks can be performed separately or in one single step, depending on the applications and the complexity of hyperlinking supported.
In the link authoring process, there are three technical issues to be addressed: when to create the links, how to identify sources and destinations and what link information to be recorded. Currently, most multimedia tools that support hyperlinking allow the author to interactively select a segment of text or an object in a document as a source or destination, and insert some form of identification of the destination in the source or in a separate file. Thus, a link is established between the source object in one document and the destination object in another (same or different document). Such an interactive link-editing method is sufficient for a small number of arbitrary links, however, the process can be laborious and error-prone for handling a large amount of documents for a complex product. A hyperlink that relates only one pair of source and destination in an arbitrary manner and cannot be generalized to relate a large number of source-destination pairs, is referred to as a trivial link. Trivial links can be easily captured with interactive link-editing tools at either authoring time or browsing time.
An automatic hyperlinking system is disclosed in U.S. Pat. No. 5,794,257 entitled “Automatic Hyperlinking On Multimedia By Compiling Link Specifications” issued on Aug. 11, 1998 and assigned to the same assignee as the present invention. This system supports hyperlinking in a large amount of product documents. Since product documents are well-structured and often refer to one another through precisely-defined technical terms, it is possible to specify patterns that exist in specific contexts to be linked together in a form of link specifications or rules. An automatic hyperlinker is invoked to process the link rules, generate link instances and insert link information in the source (and if necessary, the destination) documents. Such an automatic hyperlinking process is often performed at authoring time and the author can also verify the link information before the hyperlinked documents are delivered to the end users, such as operators and engineers of complex machinery. (The quality of their work may depend on the accuracy of the technical information that is related through the links).
Typical link information inserted in source objects includes references to destinations, applications for rendering the referenced destinations, and optionally, references to objects contained in the destinations and other bookkeeping information. The representation of the references to the (destination) objects is based on the addressing mechanism adopted by the hyperlinking process. It can be as simple as a unique id. (e.g., N23509426), or a relative path from the root object to the referenced object (e.g., TEI), or it can be as complex as a script or program that issues a query to a document database. For documents that are distributed over the Web, URLs (Universal Resource Locators) are applied. These URL's refer to the host machines on the network and the directory paths where the documents are located on the host machines.
For consumer types of information retrieval, such as most applications on the Web nowadays, the source and the destination of a link are often “loosely related”. This is due to the fact that information on the Web is contributed by individual organizations voluntarily and scattered across five continents. The structure of the (destination) documents is defined arbitrarily by the owners and all document contents and structures can change any time. Thus, it is not practical to attempt to verify the destinations at authoring time (other than manually browsing the destination Web sites), or ensure the existence of the destination documents at browsing time. In this case, URLs that refer to the “home pages” (i.e., entry points to Web sites) are used at authoring time. When such documents are browsed, the document browser makes use of the information in a URL to contact the destination site and attempt to retrieve the needed document.
In general, hyperlinks to destination information that cannot be precisely identified and guaranteed are referred to as semi-links. Although semi-links do not provide the same quality to the end users as fully verified links, it is simple to generate at authoring time and flexible to apply at browsing time. Typically, some information from the source objects is extracted as link information and the document browser, based on such link information, can do whatever is necessary to retrieve the destination information without being completely bound to what is specified at authoring time. For example, based on the id. of a machine part together with an indication of the type of product information needed, a document browser can issue a query to the product document database to retrieve the relevant information and present it to the user. As in Web applications, this type of link remains “valid” (as long as there are some documents existing at the destinations) at all times, even if the information in the product document database is updated frequently. A hyperlinking process for generating semi-links is referred to as partial hyperlinking.
In addition to the uncertainty of destination documents, there are also (source) documents that are created on the fly and cannot be processed “statically” beforehand, e.g., annotations that are added to a document by an expert, messages that are sent back and forth between two engineers, etc. In a manufacturing environment, this type of information has its technical values and often becomes part of the product documentation. Thus, a dynamic hyperlinking mechanism is required to allow the viewer of such type of information to identify the source objects of potential links immediately after such “documents” are made available. Due to the nature of this type of document and the time constraint, dynamic hyperlinking applies simple link rules (or built-in knowledge) and inserts semi-links to relate to destination information. Recently, most word processors and desktop publishing software have also incorporated some limited capability of dynamic hyperlinking, e.g., they are able to recognize proprietary document structures, URLs, etc. . . , and insert appropriate links on the fly automatically.
Hyperlinks in non-technical documents mostly relate sources to destinations directly. However, technical documents are complicated structures and information can be related in many different ways directly or indirectly through, for example, a table of contents, a reference lists, etc . . . . An indirect link goes through one or more intermediate destinations to look up and collect more information before it reaches its final destination. This type of indirect links is referred to as chain links. A chain link can be followed in two different ways, i.e., a document browser can stop at each intermediate destination and give the user a chance to view the intermediate information before moving forward or it can work quietly behind the scene and retrieve only the document at the final destination.
It is an object of the present invention to develop a Generalized Hyperlinking Process (GHP) to address most of the technical issues involved in link authoring, link management, and link browsing as discussed above. In order to support various automation tasks and modularize various steps of the hyperlinking process, it is a further object of the present invention to develop a Generalized Hyperlink Specification Language (GHSL) so that the author can specify patterns and contexts for identifying sources and destinations of links in all media and define link information as interfaces between hyperlinking modules (or tools). In particular, it is an object of the present invention that a GHSL specification be processed by the hyperlinker to generate link instances and link instances are managed by the link manager to support incremental hyperlinking at authoring time and interpreted by the link interpreter to effect hyperlinking behavior at browsing time.