The present invention relates to natural language processing of textual information in a data processing system. Specifically, the invention relates to a process comprising computer-mediated linguistic analysis of online technical documentation and extraction of representative text from the documentation to acquire knowledge essential to, for example, providing assistance to users in performing a task.
Reference books, user guides, instructional manuals, and similar types of technical documentation have long been a main source of background information (as opposed to foreground information, e.g., as found in newspapers) useful to individuals in developing the knowledge necessary to perform some task such as operating an apparatus or item of equipment, for example, a digital computer. The primary purpose of this genre of text is to assist a user of the apparatus to which the material is applicable in operating the apparatus.
More recently, with the proliferation of digital computers in all facets of modern society, and, more specifically, with the advent of desktop computers in the home and the workplace, such assistance has usually taken the form of an online help facility, that is, information useful in assisting the user in performing some task is made available at the user display device of the desktop computer by means of electronic retrieval. This type of assistance is commonly referred to as online assistance or online help. The text of the information may be stored locally in a database file (which may also be referred to as an online help database, or simply, help database) in electronic media on a memory storage device such as a hard disk drive or optical drive coupled to the desktop computer. Alternatively, the text of the information may be stored in a file on a memory storage device coupled to a server which the desktop computer accesses by way of a data network to which the desktop computer, participating as a client in the data network, may be coupled. In either case, the information may be retrieved from the memory storage device and displayed on the user display device as directed by commands input by the user from an input device such as a keyboard, mouse, pen device, etc. In a desktop computing environment, some form of online assistance is provided, usually with respect to some aspect of operating the desktop computer or performing a specific task involving an application program, e.g., a wordprocessor or spreadsheet application.
In the context of online assistance, early versions of assistance generally provide information regarding what tasks or functions can be accomplished with the tools and commands of a computer operating system or software application, and/or what is the proper syntax or procedure for invoking such a command. For example, an early form of online assistance termed Balloon Help (in which explanatory text is displayed in a small pop-up window shaped like the balloons used for dialog in comic strips) is provided on Apple Macintosh computers operating under version System 7 and later versions of the Apple Macintosh Operating System. Using Balloon Help, a user of an Apple Macintosh computer can determine the function of potentially any command, symbol, window, icon, or object visible on the user display device, i.e., the screen of the Apple Macintosh computer. When a user enables this form of online assistance, short, descriptive text messages appear on the screen describing the function performed by a particular command, symbol, or object whenever the user places the cursor on the command, symbol, or object in question.
More recent versions of online assistance provide a more comprehensive form of online assistance that not only provides assistance regarding functions of objects, but also what tasks can be accomplished with these objects, as well as how to accomplish the tasks. For example, with reference to FIG. 11 a novel metaphor of online assistance termed Apple Guide is provided on Apple Macintosh computers operating under version System 7.5 and later versions of the Apple Macintosh Operating System. Apple Guide provides online interactive instructions in response to user questions. An answer is provided to a user inquiry by leading the user through a series of interactive windows to a window or sequence of panels that contains explanatory text. An online help database behind the Apple Guide user interface provides the explanatory (coaching) text. Referring to FIG. 1, the user may begin the navigation through a series of windows upon selecting assistance by topic 102, index 103 or "look for" 104 (where an attempt is made to map a free form user query onto an appropriate answer script from the help database) from an access window 101 (here, the Full Access window as displayed by Macintosh Guide). Using Apple Guide, users of an Apple Macintosh computer are able to obtain online assistance in different forms, including task-oriented procedures on a software application's features, tutorials, advanced features for sophisticated users, and reference material of the type found on quick reference cards.
In early versions of online assistance such as the Balloon Help previously described, the process of determining the content of the database file (herein before and after referred to as the help database) in which is stored the text of information that may be retrieved by online assistance is relatively straightforward. Essentially, the content of the help database is governed by the commands that appear on the user display device or that can be invoked by the user from a user input device. It should be noted that the term command is used here to encompass any object through which a user can control the system or application software running on the digital computer, including, for example, a window, icon, symbol, or text string. The creator, or "author" of the help database simply catalogs each command and provides a short description of its function, or the appropriate syntax for invoking the command, thereby providing a complete enumeration of commands arranged systematically with descriptive details.
In the more recent versions of online assistance, the process of determining the content of the online help database is an arduous, time consuming, and iterative task, typically involving a team of instructional designers. Whereas in earlier versions of online assistance, the author simply cataloged all possible commands and the like, in more recent versions of online assistance, the instructional designers or persons acting in that capacity are not provided with such finite boundaries regarding what information is important and, thus, should be included in the help database. Providing online assistance to questions such as, "how do I do this task?" involves more than just cataloging and describing the functionality of every possible command. The designers need to determine, for example, what task-oriented procedures, what tutorials, what advanced features, and what reference material should be included. This process is one of introspection by the instructional designers. Decisions are made typically on the basis of accumulated experience and intuition acquired primarily by trial and error. One way to proceed is to first determine the key terms in the application domain (which may be composed of one or more words, i.e., which may be phrasal units), the properties thereof, and the relations (i.e., actions) that can be performed on or with the objects defined by the key terms. For example, with reference to FIG. 1, the instructional design team may determine that the term "disk" shown highlighted at 105 in window 101 is important, and thus, should be a key term included in the help database. They may further determine that actions involving the disk such as preparing, ejecting, erasing (displayed in the right half of window 101 at 106) are sufficiently important to include and relate to the key term disk in the help database. Key terms, as well as relations and properties involving those key terms essentially define the domain, i.e., the topic or application, for which online assistance is being developed. These key terms, relations and properties may be cataloged and then expanded upon in creating the help database. A domain catalog (i.e., a catalog comprising key terms, properties thereof, and relations involving those key terms, which essentially define an application domain) from which the help database is created also provides the basis for a suitable index, list of subtopics, or other means by which a user can initiate an inquiry into the help database. This process of determining the content and index to the help database comprises a substantial, nontrivial component of the design and delivery of online assistance for user tasks. It should be noted that determining the content of the help database essentially comprises the steps of 1) determining the core of key terms, relations and properties involving the key terms, e.g., "disk", "ejecting a disk", and "name of disk", and 2) writing definitions for key terms and their relations, e.g., defining "disk" and describing the sequence for "ejecting a disk". As will be seen, it is the first step of the process of determining the content of the help database to which the present invention is directed.
The same difficulty in determining the content of a online help database to be accessed by an online user assistance facility occurs in other contexts as well. For example, in the publishing industry, determining the content of the index or glossary to a reference manual, textbook, or instructional guide involves the same arduous process of determining the key terms, relations, i.e., actions, and properties which are considered sufficiently important to place into the index or glossary.
In a computing environment, for example, the desktop computer environment referred to earlier, the same difficulty arises when providing online delivery of technical documentation, that is, online access to an electronic copy of the technical documentation itself, not a help database derived therefrom. To provide this feature, a facility must exist for mapping a user query onto the appropriate position in the text in the online documentation. This necessitates, in the very least, the creation of an index or catalog of the type discussed above that additionally possesses a mapping or linking of the key terms, relations and properties to the location, e.g., the chapter or section number, page number, paragraph, and potentially, the line number, in the online text document at which they occur.
In a programming environment where it is desired to exchange information or otherwise communicate in some manner between separate software programs or routines, e.g., a mail program and a calendar program, elicitation of the type and format of information operated on and derivation of the basic processes each application is capable of executing is necessary to develop a set of procedures for successful interapplication or interprocess communication. Here, too, software engineers must determine the key terms, relations and properties of each application in order to design appropriate software procedures for successful communication therebetween.
Finally, although this discussion is not intended to set forth an exhaustive list of the environments in which it is necessary to boil down the technical information to its key terms and relations, another environment to which the same process applies is that of information management involving a digital computer, e.g., a desktop computer. For example, a user has access to a file containing a short technical document. The filename or title associated with the file in which the document is stored may not readily convey its content. Furthermore, the content of the document may not be readily discernible without fully reading the document. A content stamp of the document, on the other hand, contains key terms, relations and properties such that it is clear what the document is generally about, without having to read it to determine its content. By content stamping documents then, one is able to more accurately and efficiently manage information accessible from the desktop, whether the documents reside, for example, on a local hard disk or a hard disk of a server accessible via a data network. However, creating a content stamp requires reading a document to pull out the key information which comprises the stamp.
From the foregoing discussion, it can be seen that it is desirable to develop a method of extracting pertinent information from technical documentation which does not require or rely on the discretion of, for example, a team of instructional designers, and which facilitates the creation of a domain catalog containing the information, i.e., the key terms, properties thereof, and relations (activities related to or involving key terms) of the domain. It is further apparent that this desire for another method of extracting and cataloging pertinent information from technical documentation exists regardless of how this cataloged information is put to use, whether it be to fashion the content of a help database for online user assistance, to create an index or glossary for a reference manual, textbook, or instructional guide, or some other use, including, for example, those uses discussed above.
As will be seen, given online technical documentation, the present invention overcomes the above mentioned difficulty in creating the domain catalog from which, for example, the content of a help database underlying an online assistance tool may be determined and generated.