A. Field of the Invention
Systems and methods consistent with the principles of the present invention relate generally to information retrieval and, more particularly, to techniques for locating stopwords/stop-phrases.
B. Description of Related Art
Information retrieval systems (e.g., search engines) that use keyword-based queries typically ignore words (“stopwords”) or groups of words (“stop-phrases”) that occur very commonly and are usually unconnected to the information being sought. Typically, stopwords or stop-phrases appear in queries because users phrase their queries, even for keyword-based systems, as if they are intended for a human reader. For example, the word “a” in the query “a London hotel” is a stopword and the phrase “show me” in the query “show me London hotels” is a stop-phrase. Both “a” and “show me” are meaningless for the user's intent to find information about hotels in London.
Sometimes, however, stopwords and stop-phrases can be meaningful in a query. A search query “the matrix” is typically intended to find information relating to the movie “The Matrix,” and not the mathematical concept of matrices. Similarly, the phrase “show me” in the context of the search queries “show me the money,” “show me the way lyrics,” or “show me state” all contain meaningful uses of the term “show me.” The query “show me the way lyrics,” for instance, is probably a request for lyrics to a song titled “Show Me the Way,” such as the like titled songs by the musician Peter Frampton or the musical group Styx.
One technique for handling stopwords and stop-phrases uses a list of known stopwords and stop-phrases. Stopwords or stop-phrases that are on the list are stripped from search queries before giving the search query to the search engine. This simple technique can, however, potentially ignore meaningful stopwords and stop-phrases. One solution to this problem is to build a known list of exceptional phrases when looking for stopwords. Stopword policy may then be to not ignore stopwords or stop-phrases when the other terms from the phrase are present in a query. For example, such a list could include “the matrix” or “show me the money.” This approach can also be problematic, however, as it can be difficult to identify phrases in which stopwords are meaningful and to maintain an up-to-date list of such stopwords and stop-phrases.
Accordingly, it would be desirable to more effectively determine when a stopword or stop-phrase is present in a query.