The disclosures in this section merely provide background information related to the present disclosure and may not constitute prior art.
It is commonly required in the field of Information Technology to provide a service that searches through data sources. The data source herein may refer to data and/or document(s) on the Internet, intranet, storage devices, and so on. In order to use a search engine, a user seeking information on a desired topic generally inputs a search query consisting of terms relevant to the topic into the search interface of the search engine. In response, the search engine typically displays a search results report with a prioritized list of links pointing to relevant documents containing the search query terms. Oftentimes, a short summary of text i.e., extract/snippet is also included for each result. The extract/snippet is that portion or portions of the text in the document that contain the terms from the search query.
While displaying search results for a query, ordering of the documents that are displayed plays an important role in enhancing the user experience. There are many known methods for ranking the documents that are displayed based on their relevancy for a given search query. One of the most common methods used to prioritize the documents is the Term Frequency-Inverted Documented Frequency (TF-IDF) method. This method is widely used in various search engines. But it does not always produce the desired results. One primary disadvantage of the TF-IDF method is that it does not take into account the positioning of the terms in the documents. This is particularly relevant for shorter documents.
The TF-IDF method may hence work well for long documents, but may not work well for short documents. The TF-IDF method is based on term frequency. In short documents, the position of the terms may be more important than the term frequency. Hence it may not be an accurate method of ranking in case of short documents.
In view of the above drawbacks, there remains a need for an effective method of ranking short documents based on different criteria which would give the user relevant results at the top thereby making it easier for the user to find the desired information.