I. Technical Field
The present disclosure generally relates to the field of search assistance technologies, such as query-recommendation systems. More particularly, the disclosure relates to computerized systems and methods for applying a proximity-ranking function to documents in order to provide relevant search results based on a query.
II. Background Information
The advent of the Internet has resulted in collections of networked computer systems through which users can access vast amounts of information. The information accessible through the Internet is stored in electronic files (e.g., documents) that are accessible through the computer systems. With advancements of storage capacity technology, the amount of information stored on each computer system has dramatically increased. Due to this increasing volume of information as well as the sheer number of documents being stored on computer systems, it is becoming more difficult than ever to locate information that is relevant to a particular subject.
To assist users in locating documents that are relevant to a particular subject, the user may conduct a search using an information retrieval system that is typically referred to as a search engine. Search engines attempt to locate and index as many of the documents provided by as many computer systems of the Internet as possible. In the past, search engines would typically perform a Boolean search based on terms entered by a user, and results from the search engine would be ranked by the number of search query terms matched in a document. An occurrence of a particular search query term in a particular document is considered a “hit,” and the number of hits contribute to the document's similarity score for determining relevance of the document. The resulting documents would then be ranked and presented to a user in descending order according to relevancy.
In the above process, the scoring of the documents would not take into account proximity, or “density,” of the hits in the actual document. If hits are located close to one another in a document, this may indicate that the document is more relevant than a document in which hits are not located near each other. However, a typical search engine would not benefit from this additional analysis because a document containing the most hits overall would be ranked highest, as the rank (R) for a particular document would simply be a function of frequency of hits in a document:R=f(hits)  (1)Thus, the search engine would not differentiate between situations where hits are located farther apart from one another in the document from situations where the hits are closer to one another.
More modern search engines permit users to perform a search and to explicitly request phrase searching (e.g., a user submits words surrounded by quotes). Upon requesting phrase searching, search engines may then take into account the positional information of hits found in the documents, and rank the documents accordingly. However, requiring a user to indicate a preference for phrase searching is undesirable. Furthermore, the precision of the proximity-ranking functions of most search engines is not sufficiently accurate to fully assist a user in determining the most relevant documents for a search. That is, most hit-density estimators used in existing search engines do not use complete information about all hits in the document and can therefore lead to biased ranking functions, and improperly ranked documents.
Accordingly, proximity-ranking search engines suffer from drawbacks that limit their efficiency and usefulness. Therefore, there is a need to develop improved search systems and methods that overcome the above drawbacks.