When performing an internet search using a search system such as the search engine provided by Yandex™ (www.yandex.com), a server first receives the search query entered by the user on the user's device, such as a computer, smartphone or tablet, the server then retrieves documents to the query, a search result ranker ranks the documents, and then the server sends instructions to the client device to display a search engine results page (SERP). The SERP provides a list of links to the documents, and typically also a portion of the document (or a snapshot, also known as a “snippet” of information, available on the document), ranked in order of relevance.
The documents that the search system finds can vary greatly in their usefulness. One of the major challenges in ranking search results is to determine how to place the most relevant documents at the top of the search output (i.e. the SERP).
In some search systems, relevancy is used by the search result ranker to rank the results. Relevancy defines how well a document matches the search query. The more relevant a document is, the higher it is positioned. In some relevancy-based search systems, relevancy is calculated based on a ranking formula, which is a function of multiple factors. A factor is a numerical characteristic of a query, document, or query-document pair that can be used for assessing the appropriateness of a result in the query output.
Examples of factors include the number of words from the search query that are in the document text and whether the query belongs to a certain category. In some cases, a document's relevancy may also depend on the user who submitted the query.
The number of indexed documents and the needs of users are constantly changing. This is why the ranking formula of the search result ranker must be updated regularly. Machine learning methods are used for changing the formula. Using expert assessment data providing the relevancy of a set of known documents to a set of known search queries, dependencies are identified between document characteristics and their placement in the search output (i.e. the SERP). These dependencies are used for making changes to the formula.
The expert assessment data used for machine learning are ratings that describe how appropriately and correctly documents are ordered in the search output for specific queries. Assessors make these ratings.
Besides being used in machine learning, the assessors' ratings are also used for evaluating the quality of search results, meaning the users' satisfaction with the search results and their placement in the output.
However, providing such expert assessment data when large scale sample collection is necessary is impractical and very complicated. Such large scale sample collection is necessary to personalize search results for example. Personalization makes it possible to rank search results based on users' personal interests, thus improving search quality. For example, depending on a user's profile, the search for the term zeppelin should rank documents relating to this type of airship highest for some user, while for other users the highest ranked documents should be those relating to the band Led Zeppelin.
In order to improve rankings for such large scale data collection, some search systems now look at the interaction of the users with the search results presented on the SERP instead of or in addition to the expert assessment data. This data, sometimes referred to as post-impressions features, is then used to improve the formula of the search result ranker and therefore the ranking of documents for future searches. Examples of post-impression features include whether a document has been clicked or not or the amount of time a user has looked at a document, sometimes referred to as dwell time.
However, it is not because a document has been clicked in the SERP that it is relevant. Similarly, it is not because a document has not been clicked that it is not relevant. Looking at features such as dwell time may offer a better idea as to the relevance of a document and should therefore yield better rankings. However the amount of dwell time necessary to determine whether a document is relevant or not is selected somewhat arbitrarily.
As such, although taking post-impression features into account should yield improved ranking results, it is difficult to determine which features should be used, and in the case of value related features such as dwell time, what value should be assigned to the feature in order to consider a document relevant in order to improve the formula of the search result ranker.
There is therefore a need for a method for optimizing search result rankings obtained from a search result ranker.