1. Field of the Invention
The present invention relates to a method and apparatus for searching records stored in an electronic database. More specifically, but not by way of limitation, the present invention relates to a method and apparatus for dynamic search result refinement by refining a vague text-search query into a more specific text-search query.
2. Description of the Related Art
In recent years, the amount of text-based data electronically stored has grown tremendously. This text-based information includes everything from E-mail messages to patient records to world wide web (Web) pages. Specifically, much of the growth in stored text-based data is a direct result of the explosion in the number of Web pages. As anyone who has attempted to search the Web or any large database knows, however, Web pages and database records are practically useless unless they can be searched rapidly, accurately and efficiently. To aid in such searching, a number of search engines using a variety of techniques are available.
A common and accurate searching technique is full-text searchingxe2x80x94shown in FIG. 1. In a full-text search, the entire text that is to be searched, e.g., a database record, is scanned to find a match with a particular received query. Steps 100 and 110. When a match between that which is scanned and the search query is located, the location of the match is returned as a search resultxe2x80x94called a hit. Step 120. This location corresponds to a particular database record or in terms of the Web, the location or the URL of a particular Web page.
Although full-text searching is very accurate, this accuracy sometimes presents major drawbacks. For a very specific query, the full-text search is accurate and efficient in that it returns only a few locations. Thus, a user can quickly scan through the records corresponding to these returned locations. In essence, full-text searching is most useful and most powerful when it is searching for a needle in a haystack. If, however, a user is not searching for such a needle, i.e., does not have a very specific query, the full-text search is cumbersome and inefficient. For example, if the user attempts to search the Web for the phrase xe2x80x9cNew York,xe2x80x9d a full-text search could return millions of hits. obviously, it would be virtually impossible for the user to wade through millions of records to find specific records of interest. Even if the particular site for which the user is searching is included in the hits, the user could easily overlook this site because it is buried among the other records.
One way to reduce the number of hits from a full-text search is to narrow the search term. For instance, the phrase xe2x80x9cNew Yorkxe2x80x9d could be narrowed to xe2x80x9cNew York transit.xe2x80x9d Although it may seem elementary to modify the query by, for example, adding another term, many times it is not clear as to which term to add or how to otherwise modify the query. If a user is looking for commuter buses leaving New York City, it is not clear whether the proper query would include xe2x80x9cbusesxe2x80x9d, xe2x80x9ccommuterxe2x80x9d, xe2x80x9ctransitxe2x80x9d, or any combination thereof. Very possibly, the proper query might involve terms that elude a user, i.e., xe2x80x9cport authority,xe2x80x9dxe2x80x94because many commuter buses leave from the New York Port Authority.
Some current search engines provide a user with possible or suggested terms to be added to their search. For example, a search engine might suggest that for the queryxe2x80x9clawxe2x80x9d one of the following terms be added: xe2x80x9clegal,xe2x80x9d xe2x80x9cintellectual,xe2x80x9d xe2x80x9ccourtxe2x80x9d or xe2x80x9cappeals.xe2x80x9d Thus, the new search could be, e.g., xe2x80x9claw appealsxe2x80x9d or xe2x80x9claw legal.xe2x80x9d Thus, the suggested query is not a coherent phrase but rather a jumble of related words. The flaw with these current search engines is that the suggested terms do not necessarily focus a user""s search. An additional flaw with these current search engines is that their list of suggested terms is not dynamically updated to reflect the most popular or most requested records.
Accordingly, an invention is needed that provides the user the ability to focus a general search query so that a user can readily refine a general search query until the search results contain a usable amount of record locations. Further, an invention is needed that dynamically generates lists of more specific queries for a given query, thereby giving a user the most up-to-date and most popular choices for query refinement.
The present invention overcomes the above identified problems as well as other deficiencies of existing technologies by providing, in one aspect, a method for generating an index useable to refine a vague query into more specific queries for searching stored records. The method includes the step of generating a phrase list. This phrase list can be constructed by receiving a number of queries from various users and then extracting phrases from those queries. These extracted phrases may be quoted phrases, i.e., user""s search phrases that were placed in quotation marks. Further, the phrase list could include only those phrases that are determined to be statistically significant. The statistical significance of the search phrase may be determined by the number of times a particular phrase appears in the received user queries. This number is the query frequency.
Another step of the method of this invention includes computing the index frequency of the phrases in the phrase list. The index frequency indicates to what degree a phrase is general or specific, and it can be calculated a number of ways. For example, the index frequency for a particular phrase can be the number of times that the particular phrase appears in all the html pages associated with a particular search engine database.
It is contemplated that in one embodiment of the present invention, a step is included that determines whether the frequency for a particular phrase is lower than that of another particular phrase. Good results have been achieved by utilizing either of the index frequency or the query frequency. In using the index frequency, if the index frequency for one particular phrase is lower than that of another particular phrase, the present invention stores the two phrases in an index, the two phrases being associated with each other in the index. In one embodiment, the two phrases are stored in an index only if the index frequency for one phrase is many times larger, e.g. 10 times larger, than that of another particular phrase. Regardless of what difference is required in the index frequencies, for the two phrases to be stored in the index, the two phrases should be at least partial syntactic matches. In other words, the two phrases should share at least one common term before the two phrases can be stored in the index. It is contemplated to be within the scope of this invention that the two phrase can be stored in the index if the common terms are similar but not an exact match.
In an additional embodiment, the method of the invention includes the step of receiving a query from a user. If an entry similar to this query is stored in an index, index entries associated with the that entry are transmitted to the user. The user then can use the transmitted index entries to either search the database or to further refine the original query.