The present invention relates to a technology of searching and displaying a structured document produced in the standard generalized markup language (SGML) or the hypertext markup language (HTML), or more in particular to a method and an apparatus for searching and displaying the result of searching a structured document in highlight.
With the extension of ownership of word processors and the like, the document information produced are going electronic more and more. These electronic documents have individual unique formats depending on the equipment or the software which has produced the documents and cannot be used with another equipment or software. The need has arisen, therefore, for some conversion means.
Various structured documents have been proposed as a common format for exchanging such documents. These structured documents can define the hierarchical structure including chapters, sections and paragraphs constituting a basic structure of documents and also can contain layout information.
A descriptive language for the structured documents for which standardization is under way is a standard generalized markup language (SGML). SGML uses a method of expressing a document element by embedding a specific character string called a tag in the text as element information of a structured document. According to SGML, the names and contents of tags and the document elements indicated by tags can be defined by a document type definition (DTD).
The above-mentioned SGML and DTD are described in detail in “Practical, SGML” (edited and translated by the SGML Gathering, Working Group for Practical Application, Apr. 20, 1992, published by Japan Standards Association).
Assume that these structured documents are registered in the data base of a search system and searched by specifying an element name. In the case where the DTD varies from one document to another to be registered, a processing method is to analyze the elements of each document, determine which portion of the document corresponds to a specified element name, and acquire and retrieve the character string to be searched.
This method, however, consumes considerable time for processing. Also, in a method using a table listing a portion of each document corresponding to each element name, it is necessary that all the element names appearing in each document are managed collectively and to register all corresponding portions of each document for each element name. This requires a management table of enormous size.
Further, all documents in registration with different DTDs do not necessarily have the same element to be searched. Also, in the case where different names of the same meaning such as “abstract” and “gist” are attached to elements, all the different element names have to be specified for search. In actual practice, therefore, a structured document cannot be searched easily.
For the search of a structured document, therefore, it is necessary to register only the documents generated according to the same document type definition. In this way, element names specified in advance are used to manage corresponding portions of each document.
At the time of search, an element name to be searched and a query are specified. If a character string meeting the query is contained in the portion of each document corresponding to the specified element, the query is judged as matching.
An explanation will be given of conventional techniques having the function of displaying the contents of a document as the result of searching a structured document.
A first conventional technique that can be cited is JP-A-8-339369 entitled “Document display apparatus and document display method”.
This conventional technique discloses a method of converting into a layout for element analysis and element display and displaying the contents of a specified element of a SGML document. It is possible to display a structured document by element using this technique. Further, this conventional technique provides means for highlighted display (an intensified display with the color, style or size of a character changed or a character underlined) of a specified element.
The means for highlighted display disclosed in this conventional technique, however, is for controlling a display method for each element, and specifies whether a particular element is displayed or not displayed and whether it is displayed in highlight or not. This conventional technique, therefore, fails to disclose a method of realizing highlighted display of a matching query term which is required for displaying the result of searching a structured document.
A second conventional technique disclosed in JP-A-8-212230 entitled “Method of document search and document searching apparatus” is a method for highlighted display of the result of searching a document other than a structured document.
This conventional technique, however, only acquires a matching strings position of a text for display and adds highlight information, but has no function of adding the highlight information to a document obtained as a result of searching a structured document.
A mere combination of these two conventional techniques cannot realize the function of adding the highlight information to a matching query term in a document output as the result of searching a structured document.
Specifically, highlight display of a structured document requires means for producing a DTD with element information for highlight added to the DTD used for producing a document to be displayed.
A method of altering the document type definition for adding highlight information to a structured document is disclosed in JP-A-8-159202 entitled “Method and apparatus for plate management of structured documents” constituting a third conventional technique, in which a DTD is produced by adding a new element to the original DTD.
The use of this conventional technique makes it possible to produce a document type definition with the highlight information added thereto.
It is seen that the first and second conventional techniques permit a structured document to be displayed with the elements thereof clearly known on the one hand and permit a highlighted display of a matching strings position of a document not structured on the other.
Further, the use of the third conventional technique makes it possible to specify a document type definition with highlight information added for each element.
By combining these techniques, it is possible to output a structured document with highlight information added to the result of searching a specified element thereof and thereby to realize a highlighted display of the structured document.
In recent years, the internet has explosively spread as a method of acquiring the latest information. Also, the function of searching information on a web has been improved as a means for quickly acquiring information required by the user from a great amount of information available on the internet.
The hypertext markup language (HTML) is for describing the contents of a document and expressing information for linking to other resources and a document format on WWW (World Wide Web). HTML is regarded as a SGML described in accordance with a specified DTD. A means for producing and processing a HTML document is a HTML editor. A HTML browser, on the other hand, analyzes and displays the HTML document thus produced.
There is a type of HTML browser which is supplied with a character string (hereinafter referred to as “the query term”) and which has such functions as searching a HTML document on display and displaying a matching strings position intensively by reverse video or the like.
A SGML browser provides a means having the function of display and processing a layout in SGML. The SGML browser conducts full-text search of the SGML document on display and displays a portion meeting the query term in highlight. Such a browser analyzes a document and produces display data when the document is displayed. The data for display on the browser are searched and a matching strings position is displayed in highlight on the screen.