As the amount of information available on the World Wide Web grows and search engines continue indexing the data, it becomes apparent that the limited amount of slots on the search engine result page does not always promote the most relevant documents.
At one end of the spectrum, a search engine user may provide a query that is too generic. Internet search engines like Google.com (Google Inc.), and Yahoo.com (Yahoo! Inc.) discriminate between pages before displaying the results. The discrimination is done by identifying really popular pages, pages with lots of inbound links, and pages with just few inbound links. However, the top yielded results may still appear of low quality, partially because of the inexact or too generic initial query, and partially due to the search engine considering pages with an aggressive Search Engine Optimization (SEO). At the other end of the spectrum, a too specific query will almost always yield not so “popular” results with few inbound links. Ranking these results is challenging in terms of inbound links quality. As a result, the search engine again may fall into the trap of considering primarily pages with aggressive SEO.
In the advertisement space, the too generic query problem is being addressed by identifying publication clusters with a matrix of correlation coefficients that ties together advertisers bids and search queries (e.g. U.S. Pat. No. 7,225,184 to Carasco et al. (2007>>. However, the implementation of invention discussed herein is not applied to a generic content retrieval service that the search engines provide.
The results of a search are skewed by the SEO of content. Typically, an artificial ranking algorithm leaves room for cheating and the SEO is an entire industry based on this. Only live human beings may reliably rank document content. After all, it takes us 11-12 years at school and few more years at a university to first become capable and later become better at doing this. Further, the rank only makes sense in the terms of the query that led to the document fetch in a first place (“query” here refers to one or more of the end user query sent to a search engine, document title, keyword or phrase tags provided by the document author, inbound link descriptions), However, hiring people to assist the World Wide Web document ranking is inefficient, moreover, the semantic diversity and fragmentation continue at an ever increasing pace. In addition, peer-to-peer networks are gaining popularity and are found in enterprise software applications. However, in a peer-to-peer network, content may not be publically visible and cannot be indexed by the existing Web search engine servers that gather Internet content, referred to as crawlers.
Typically, the crowd-sourcing effort can mitigate the SE issue. Presently, human assisted content tagging is the predominate solution. Sites like ≦Digg.com≧ (Digg Inc.), ≦Delicio.us≧, ≦XMarks.com≧ (XMarks Inc, formerly FoxMarks) do exactly this and continue to gain popularity. In some solutions, local to the user browser, document bookmarks are being uploaded to the site and then used for ranking. A social component is being introduced—for instance as discussed in U.S patent application 2005/0091202. In other web sites, like ≦Digg.com≧ for instance, the user must manually submit the document location and tag it with appropriate keywords. Thereafter, the other ≦Digg.com≧ users vote for the document content, thus increasing the document rank and accessibility. Whatever the solution is, an explicit user action is required for the tagging to work. However, explicit action lowers the coverage of content.
Further, manual tagging can be incomplete or incorrect. It reintroduces and reinforces the ranking issues seen before Google released their inbound link based page-ranking algorithm. An even worse problem lies in the fact that the search query (document link that led to the content promotion in a first place) is not captured. Manual reintroduction of the keywords during the tagging process, by somebody else but the author, can introduce spam. Some solutions like that provided Jookster.com mitigate the spam issue by limiting the search scope.
To summarize, the one or more prior art in this technology domain have one or more of the following disadvantages:                (a) Document ranking is an artificial ranking algorithm based on inbound links, and a subject of successive attacks. The documents ranking is currently being abused by the aggressive SEO that leads to less relevant results.        (b) Since the inexact user queries are either too generic or too specific, to assist the SEO techniques to gain edge by considering the already skewed document content rank. As result, the search engines provide not so relevant documents as top recommendations in the result-set.        (c) Crowd-sourcing efforts though popular, require explicit action to be taken by the user. As a result the coverage is lowered.        (d) Tagging breaks the lookup query-document link and results in a rank that is not bound to the document semantics like the user or the author perceive it.        (e) Addressing the semantics problem by injecting relevant keywords during the manual tagging process introduces spam.        (f) Most tagging services operate as separate web sites, and to use their services the user must abandon a search engine of choice and migrate to the tagging service site. As a compromise solution, user visits two or more search engines and tagging service site interchangeably. This is inconvenient and can further lower the coverage of the tagging service.        
Systems and methods are therefore desirable to manage user assisted ranking for document relevance recommendations and searching.