The exemplary embodiment relates to a method for linking text strings in a document abstract to corresponding text in the main body of the document. It finds particular application in the evaluation of the cohesiveness of an abstract and in the navigation of documents, such as journal articles.
Abstracts are widely used in research articles and other documents to provide a summary of the research article, which is described in detail in the text and any accompanying drawings in the main body of the article. Formally, an abstract can be defined as “an abbreviated, accurate representation of the contents of a document, without added interpretation or criticism, and without distinction as to who wrote the Abstract” (International Standard ISO-214). Abstracts have been said to have four functions: as stand-alone mini-texts, giving readers a short summary of a study's topic, methodology and main findings; as screening devices, helping readers decide whether to read the whole article; as previews for readers intending to read the whole article, giving them a road-map for their reading; and in providing indexing help for professional abstract writers and editors. (see, Huckin, T. Abstracting from Abstracts, in M. Hewings, Ed., Academic Writing in Context, pp. 93-103 (2001)).
Due to these functions of the abstract, the text is often present as meta-data in dedicated content repositories of academic articles. This means that, in contrast to the full text, the abstract plays a key role in information retrieval. Thus, the quality of the abstract is of primary importance both for the readers and the authors. It is often the case, however, that the abstract does not parallel the body of the paper in content and order. Sentences from the abstract and the text rarely have an exact or even a fuzzy match with each other. There may also be inconsistencies in the data presented in the abstract and main body text. Testing the coherence of the abstract and the article body would be manually intensive work, and consequently it has not been carried out systematically.
There remains a need for automated or semi-automated methods for evaluating the coherence of an abstract and its relationship to the main body.