The ubiquity of computers in business, government, and private establishments has resulted in the availability of massive amounts of information from network-connected sources, such as data stores accessible through the World Wide Web, also called the Internet. The availability of and dependency on such massive amounts of information necessitate effective search techniques in order to accurately find documents containing desired information. In recent years, computer search methods and tools have become widely available. Most computer search tools depend on search engines. Search engines are software components that take as input a query from a user, conduct a search based on the query, and return search results to the user. Internet search engines may be implemented as special sites on the World Wide Web that help users find information stored on other Web sites.
Search engines index information, such as keywords, attributes, text, etc., that the search engines conclude describe the content of documents, including any locally stored documents, files, etc., and network-stored documents, i.e., Web pages. Subsequently, search queries supplied by the user to the search engine are compared against the index to direct the user to documents that likely contain information of interest to the user. As the number of search engine queries and the amount of content indexed by a search engine increase, it becomes more difficult to efficiently and accurately return the results of a search. The acceptability of the results returned by a search engine are highly dependent on the amount of information included in the returned results and how the returned results are presented to a user. Limiting search results can be as important as not missing any relevant results. For example, limiting search results to include only focused information of interest to the user or presenting the results in a way that helps the user more quickly evaluate the results can increase the quality of the results.
In addition to general-purpose search engines, special-purpose search engines and/or indexed information exist to serve special search needs. One example is trademark document clearance searches. Trademark document clearance searches are conducted to determine if potential trademarks (or service marks) have been used in a common law and/or a descriptive manner in documents. Trademark document clearance searches differ from general purpose searches in several respects. Trademark document clearance searches are generally conducted by searching documents to determine if they include any one of a list of potential trademarks or service marks in combination with one or more of a list of common industry terms that describe goods or services with which the potential trademark or service mark is to be used. In addition to the marks, the trademark/service mark list may include visual and/or phonetic equivalents. Various types of queries can be formed from the lists. For example, FIG. 2A illustrates a composite query 200 comprising a sequence of separate queries 202, each including a word chosen from each of the two lists—one from the list containing proposed trademarks and the other from the list containing industry terms. In this example, a user seeking to research a trademark for a new drug called “Bitox”, is searching variations on the Bitox name, namely, Pitox, Bitos, Bittox, etc., all included in the first list, in combination with applicable industry terms, namely, medication, prescription, treatment, etc., all included in the second list. A single query 210 equivalent to the trademark sequence query of FIG. 2A is depicted in FIG. 2B. FIG. 2B illustrates a single query comprising a Boolean logical combination of the two lists. The FIG. 2B query results in the same number of independent queries as the FIG. 2A query.
Trademark document clearance searches include hundreds of both trademark/service mark variations and industry terms that, when combined, may result in tens of thousands, if not millions, of queries. This combinatorial explosion often makes trademark document clearance searches slow and inefficient because each combination of terms is submitted to the search engine as a separate query. Another potential problem in this kind of search is that redundant search results are often returned by search engines because more than one query matched the same documents. Another potential problem is that the search results may not identify which specific query terms caused the match. Another potential shortcoming is that the distance between the two terms in each query is not provided directly to the user. Generally, the closer the two terms are together, in word distance, the more related the terms are. For example, if a the term “Bitox” is one or two words away from the term “medication,” then it can be concluded with a high degree of certainty that Bitox is likely associated with a medication in the related document. Whereas, if the terms “Bitox” and “medication” are separated by several hundred words, the use of Bitox in the document is more likely unrelated to medication.
One way to increase the quality of search results and improve search efficiency is to improve the query process.