1. Field of the Invention
The present invention relates to a technique for searching for a character string in natural language text and, in particular, to a technique for displaying a search result appropriately and succinctly by using dynamic programming applied to a frequency-ordered context tree.
2. Description of Related Art
In a search for a character string in text, context strings surrounding a hit provide useful information. For example, a document or set of documents can be searched for a character string such as “button.” When the search finds the character string, it's deemed a ‘hit’. However, words surrounding the hit “button” may also be useful and should be displayed. The words surrounding the hit are the context strings. For example, if a search finds the word “button” in a document, context strings such as “is clicked” or “is pressed” which follow “button” can also be displayed. In this way a document can be checked for consistency of wording on the basis of which of the context strings follows button. In another application, the document and character string can also be checked whether or not a definitive article is given to a particular English proper noun. Information about context strings surrounding a hit is also important in other searches such as collocation and person name searches.
A conventional technique, KWIC (KeyWord In Context), is known in which character strings surrounding a search term are sorted and displayed.
For example, all context strings displayed when “” (button) is searched for using KWIC may be as follows:                                                        
However, KWIC has a drawback in that the entire trend cannot be seen at a glance if too many hits are found.
A technique disclosed in Masato Yamamoto, Kumiko Tanaka, Hiroshi Nakagawa, “KIWI: A Multilingual Usage Consultation Tool based on Internet Searching”, Annual Meeting of The Association for Natural Language Processing, 2005, and Published Unexamined Patent Application No. 2004-164133 proposes an extended KWIC method that enables measurement of the levels of importance of context strings to be displayed. However, the extended method still has a drawback in that an optimum combination of multiple context strings cannot be selected and a large number of similar pieces of text are displayed.