Generally described, computing devices and communication networks, such as the Internet, provide computer users with access to a wide variety of electronically accessible material. One tool typically used for searching electronically accessible material is a search engine. Search engines provide the ability to identify a set of electronically accessible material whose elements are most relevant to a search query. Search engines typically process search queries by first obtaining search criteria, such as keywords, Boolean logic, field/date restrictions and other criteria, from a user. The search engine can then implement a search algorithm that associates the search criteria to a search engine index representative of available electronically accessible material. By associating the search criteria with the search engine index, the search engine identifies the most relevant electronically accessible material from the search engine index.
In one embodiment, the algorithm implemented by a search engine can utilize a number of processing rules to identify the most relevant search engine index entries and/or prioritize search index entries that are most relevant to a search query. For example, the processing rules can associate a value, or score, based on a determination of whether keywords in a submitted search query can be found in the search engine index entry. Additionally, the processing rules can associate an additional value and/or deduct a value based on a relationship between the associated keywords, such as proximity. The sum of the outputs of the processing rules can be generally referred to as a relevancy score for the keywords to the search index entry.
In a typical embodiment, a search engine index can correspond to a set of structured fields representative of various aspects of the available materials, generally referred to as characterization fields. For example, entries in a search engine index storing information about printed publications could include a set of structured fields corresponding to aspects of the publications. The printed publications' structured fields could include fields such as title, author, publisher, content key terms, text of the publication, publication summaries, and the like. Depending on the format/criteria for the characterization fields, the search engine index may have duplicative information in the characterization fields. For example, a full text of the publication characterization field may include title and author information, which could also be found in the title characterization field and the author characterization field (e.g. a title characterization field of “Stephen E. Ambrose—Citizen Soldiers” and an author characterization field of “Stephen E. Ambrose”).
In embodiments in which search engine indexes include multiple structured fields for each search index entry, search engine processing rules can become more difficult to implement. In one approach, the processing rules can be applied to each characterization field in the search index entry. For example, the processing rules can provide a score for the presence and proximity of search criteria keywords for each characterization field. The relevancy of the search engine index entry could then be based on a cumulative processing rule score for each search engine index entry. However, this approach could result in the consideration of the same associated keywords in multiple characterization fields. For instance, in a search index corresponding to printed publications having multiple characterization fields, a search query for “John Grisham—The Innocent Man,” could result in higher keyword relevancy scoring for all search engine index entries having “John Grisham” in the title field, the author field, and in the text of the document (e.g., “John Grisham—The Chamber”, “John Grisham—The Client”, etc.). Because of higher keyword relevancy scores for a set of search engine index entries, the most relevant search index entry (e.g., corresponding to the “Innocent Man” publication) may not be readily apparent in the search engine results.