In recent years, the use of electronic documents has become increasingly common. In general, electronic documents easily undergo information processes such as analysis and search processes compared to normal printed documents. For example, when keywords are extracted from an electronic document set and are presented, the user can easily recognize an overview of the electronic document set without browsing each individual electronic document included in the electronic document set. Furthermore, the user can easily perform a refined search of a set of electronic documents by use of the keywords.
Various techniques for extracting keywords from electronic documents have been proposed. More specifically, a technique of extracting keywords based on statistical features such as frequencies of occurrence in electronic documents is known. For example, terms having higher frequencies of occurrence in an electronic document set are extracted as keywords. Also, a technique of grouping keywords based on the degrees of correlation among them and presenting keyword groups in place of simply enumerating and presenting extracted keywords is known. Grouping of keywords is helpful to ascertain an overview of the electronic document set.
The technique of extracting keywords based on statistical features such as the frequencies of occurrence tends to extract technical terms not easier than basic terms. In general, technical terms are helpful to conduct a detailed refined search of an electronic document set compared to basic terms. Upon grouping and hierarchizing keywords based on co-occurrence relationships between keywords, co-occurrence relationships between basic terms having higher frequencies of occurrence are easily determined. Furthermore, keyword extraction based on statistical features is suited to a large-scale electronic document set such as Web pages, but is not always suited to a small-scale electronic document set such as in-house documents.