The teachings of patent applications US2006155684A1, US2007220045A1 and PCT/US2010/038279 are incorporated herein by reference in their entirety. Furthermore, where a definition or use of a term in a document, which is incorporated by reference, is inconsistent or contrary to the definition of that term provided herein, the definition of that term provided herein applies and the definition of that term in the reference does not apply.
The value of a search engine depends on the relevance of the results it produces. While there are numerous web pages that comprise a particular word or phrase, some pages may be more relevant, popular, or authoritative than others. Search engines employ different and proprietary methods to rank the results.
Web search engines store information about many web pages, which they recover from the page's HTML. These pages are retrieved by a crawler, also known as a spider, which follows every link on the site. When a user enters keywords into a search engine, the engine examines its index and provides a listing of matching pages according to its algorithms and criteria, usually by way of a digest containing the document's title and sometimes parts of the text.
Data about web pages is stored in an index database for use in later queries. The purpose of an index is to allow information to be retrieved as quickly as possible. The index is built from the information stored with the data. The engine looks for the words or clusters of words as entered. Some search engines provide features such as proximity search, which allows users to define the distance between keywords. There is also concept-based searching where the research involves using statistical analysis on pages containing specific keywords or phrases.
How a search engine decides which pages are the best matches, and what order the results should be shown in, varies widely from one search engine to another. There are two main types of search engines that have evolved: 1) a system of predefined and hierarchically ordered keywords that humans have programmed extensively and 2) a system that generates an “inverted index” by analyzing texts it locates.
Search engines rely on metadata that are strings of data or text associated with a particular webpage. Usually a user has an idea of what he is looking for and uses keywords to guide the search engine in retrieving web pages. If those keywords are present in metadata associated with a webpage and said webpage ranks high in the search engine index, then the webpage will be displayed more prominently as compared to other web pages also related to said search.
There are software applications that provide some sort of discovery capabilities by associating items in a finite set such as Discovr or Liveplasma. These systems operate according to a model whereby each item in the data set is connected to a small number of other items. These associations are illustrated by having each item showing lines connected to similar items.