In traditional text search systems (e.g., Google), it is common to highlight the query terms occurring in the retrieved documents to give the user feedback. However, this approach does not work when using Latent Semantic Analysis (LSA). LSA is the basis of a variety of document analysis and search techniques. One aspect of using LSA for text-based searches is that a document may be deemed highly relevant to the specified query terms and yet may not actually contain those terms.
There is a large body of work in text processing and information retrieval, much of it based upon LSA and similar techniques, including a topic of research commonly termed query-related summarization. The invention described here falls generally in that area, where identifying terms in document A that are relevant to another document B is in some sense a form of summarization of A. In general, the works in this field cover more traditional forms of summarization, i.e., identifying key sentences in the source text to use as a summary.
LSA is used to identify synonyms for single search term queries by finding closely related terms in the latent semantic space.
It is desirable to develop a system and method for identifying query-relevant keywords in documents with latent semantic analysis, where a query may comprise more than one search term. It is also desirable to filter the resulting list of query-relevant keywords according to the context of the query. It is also desirable to identify keywords in the found document that are most closely related to the query terms, regardless of whether the keywords appear in the set of query terms.