1. Technical Field
The invention disclosed herein broadly relates to data processing techniques, and more particularly relates to an improved method for creating key words in the process of document abstraction, and an improved method for relating query terms used in document retrieval to the key words derived during document abstraction. The invention is of use and value in any setting that requires key words to be related morphologically to one another. Such applications include, but are not exclusively, document retrieval and natural language interfaces to database management systems.
2. Background Art
The related application is "Paradigm-Based Morphological Text Analysis for Natural Languages" by A. Zamora, Ser. No. 028,437, filed Mar. 20, 1987, assigned to IBM Corporation. Reference is also made to U.S. Pat. No. 4,731,735 to K. W. Borgendale, et al., assigned to IBM Corporation, entitled "Multilingual Processing for Screen Image Build and Command Decode in a Word Processor, With Full Command, Message and Help Support," for its disclosure of a data processing system in which the invention disclosed herein can be executed.
The disclosure of the above cited patent applications is incorporated herein by reference to serve as a background for the invention disclosed herein.
For the last two decades the retrieval of documents using a computer has been a prominent application in both business and library science. Two methods of preparing and retrieving documents have become established in the state of the art. They are:
.largecircle. Keyword--At the time of document archival, operator intervention is required to attach to the document a set of terms that, in the opinion of the operator, describe the content/theme of the document being stored. The words or phrases may or may not occur within the document and represent a subjective judgment by the operator of how in the future the subject document may be queried.
.largecircle. Contextual--Prior to document archival, each word in the document is reviewed and based on a criterion or set of criteria, words and phrases are chosen as being retrieval terms for the subject document. In its simplest form, each word in the document can be viewed as a retrieval term. Alternatively, elaborate grammatical criteria can be used to scale down the selection of keywords to more specific words which, based on linguistic and information science methodology, are determined to have a greater level of specificity and of more use for later retrieval.
An example of a keyword based retrieval system is the current IBM PROFS system and an example of a contextual system is the current IBM STAIRS program product offering. For purposes of this invention, we do not differentiate between keywords derived by an operator or contextual references derived by some automatic criterion, being either empiric or linguistic or another method. In further discussion within this disclosure, no differentiation is made as to whether the keywords are related back to the document as an inverted file with pointers to the paragraph, line, and position within line or whether they are just associated with the document as an entity with no internal reference or pointers.