(1) Field of the Invention
The present invention relates to a character string retrieval system, and more particularly, to a character string retrieval system for carrying out a process whereby when a full text search is performed on a structured document, a structure unit to which a hit character string belongs can be interactively displayed.
(2) Description of the Related Art
In recent years, large quantities of documents have come to be electronically processed by means of personal computers or word processors. Electronic documents are associated with a problem that documents produced using different types of machines are often difficult to be converted or reused, but the problem can be solved by employing SGML (Standard Generalized Markup Language) which is internationally agreed upon in respect of document conversion.
A document written in SGML (SGML document) is composed of three parts, that is, SGML declaration, document type definition (DTD), and document instance. The SGML declaration serves to ensure conversion between systems which are in different environments or use different languages, and declares a character system, symbols used in the declaration and definition, and characters used for tag names. The DTD defines the logical structure of elements in a document and the attributes of the elements. The elements are chapters, sections or paragraphs, for example, and the DTD defines the structure of the elements, names of tags (element names), numbers of occurrences of tags, order of occurrence of tags, replacement words, omissibility/non-omissibility of tags, etc. The document instance is an actual document tagged in accordance with the DTD.
When a full text search is performed on a structured document such as an SGML document, the result of the search is generally displayed in methods described below, as in an unstructured document. Namely, (1) a method in which the searcher is informed by means of the number of coincident strings or the names of hit files that the search string exists in documents; (2) a method in which the searcher is informed that the search string exists in a document and also the position at which the search string appears in the document is displayed; and (3) a method in which the searcher is informed that the search string exists in a document and also the position at which the search string appears in the document is displayed, together with specified numbers of characters preceding and following the located character string.
To determine whether the located character string is a character string which the searcher really needs, the context of the character string, for example, must be taken into consideration. With conventional techniques, however, text preceding or following the located character string is not displayed at all, or if displayed, only a specified range is displayed, posing a problem that adequate information necessary for making the determination is not available to the searcher.