Digital electronic devices such as desktop computers, laptops, tablets, and smart phones have an ever-increasing amount of digital memory built into the device. As memory capacity increases, more and more specialized applications are being downloaded and installed. At the same time, the increased memory capacity allows storing more personal and licensed digital documents and sound or visual media content in the form of email, notes, presentations, spreadsheets, electronic books and magazines, songs, videos, clipbooks, pictures, tweets, Facebook posts, and web documents, to name few. Organizing and navigating such a collection of documents is not a straightforward task, especially when considering that discrete applications deal with each individual data type/piece. Further, in many modern smartphones, tablets, and operating systems, the document location is abstracted from the user by the operating system and applications. Due to such content location abstraction, the line between an application and content/data can become fuzzy.
Traditional content search approaches vary depending on the device, operating system, and application. Some adopt an approach of scanning document content/meta-information for matches upon user query entry (e.g., the Unix grep command line utility; Windows search in non-indexed locations). Other solutions index new content as the content is persisted on the device. Later these applications consult the index compiled to retrieve documents related to the user query. Typically the content indexing is offered by an operating system component, for instance Spotlight in Apple's Mac OS, Windows Indexing Service/Windows Search or specialized third party search applications like Google Desktop. These engines index all the well-known content available, working around the discrete content type/application “silo” problem. Despite the approach, search applications do not always perform well when it comes to delivering the most relevant results for a user query. Their weakness comes in part from their strength: they index all the content and consider that every document is created equal.
As the amount of information available on the device grows and the content search engine continues indexing the data, it becomes increasingly apparent that the limited number of slots on the search engine result screen will not always promote the most relevant documents for the user intended context of the search. The user query terms play an important role.
At one end of a spectrum, the user may enter too generic a query. While on the Web, the search engines can discriminate between pages based on authority rules typically involving inbound link statistics. However, on a single device no such external (to the user) authority exists. Too narrow a query may not return results at all or return a long document of potentially little relevance.
As an illustration, one may consider the impact of persistent Twitter sample feed data on Spotlight. In less than a month, within the sample data feed, virtually every English word may be found at least once. If the feed has been persisted locally, there is good chance it is also indexed. Due to the fact that feed files contain rich variety of English words, when searching for content, the user receives personal documents interleaved with data files from the Twitter sample feed! In fact, from experience, the Twitter sample feed files would even rank higher than the personal document being sought! This is a surprising result, considering that there is no interaction between the end user and the Twitter feed data files.
While the Twitter data sample is an extreme example, it nevertheless illustrates the problem of substituting the Twitter feed for PDF documents downloaded from the Web. While the content downloaded from the Web is important, a Word, Excel document or a Note related to the lookup query and written by the user his/herself, should in practice be considered far more relevant than content from any other party.
The above examples demonstrate that it would be advantageous for the level of user interaction to be considered when indexing and ranking documents on desktops, laptops, smart phones, tablets or any other computing device. The human interacting with the device should orchestrate the document ranking. In a good search application design, however, the user cannot be interrupted and asked to tag or provide explicit feedback on a particular document. Two reasons against such manual tagging/ranking are low coverage and annoyance. Further, explicit user based tagging can be incomplete or incorrect and may introduce spam due to the fact the user only considers a small subset of options while tagging.
With the advent of content provider/aggregator applications on mobile devices, it is apparent that not all applications on the market deliver the same user experience or the same quality content. There exists a need to discriminate between various applications. One frequently used measure of application relevance is the total number of application installations. However, the installation count comes to enforce a ‘rich becomes richer’ situation. Being late in the marketplace, even with a quality product, can require additional marketing and promotion (beyond those embedded into the application marketplace content promotion mechanisms) in order to succeed. For instance, in the oversaturated Apple App Store, it is rare these days for an application to be successful (that is, to achieve a large number of installations) by relying solely on user ratings and the search capabilities provided by the store.
The content ranking problem is also seen in application verticals, for instance in games. It is difficult to locate an engaging casual game. Startups emerge, aiming to solve the inefficiency.
Additionally, the problem of algorithmically ranking media content like pictures, video, sound, etc., remains largely undeveloped.
To summarize, the prior art in this technology domain present one or more of the following disadvantages:
(a) Document ranking is an artificial ranking algorithm that does not necessarily align with the user intent.
(b) Since inexact user queries are either too generic or too specific, due to the absence of an external authority the search engine (in the context of the device) provides less relevant documents, applications, sound or visual media as top recommendations in the result-set.
(c) Content bookmarking and tagging breaks the natural link between a lookup query and retrieved content and results in a rank that is not bound to the document semantics like the user or the author perceive it.
(d) Addressing the semantics problem by injecting relevant keywords during the manual tagging process introduces spam.
(e) Introducing content that is long, rich in words, and uninteresting in its current form, pollutes the search engine index and pushes content that is unique, user generated, and frequently interacted with out of the search results.
(f) Content or application stores/markets perpetuate a ‘rich becomes richer’ scenario and may slow down quality content and applications in reaching their targeted audience(s).
(g) Ranking media content remains largely undeveloped.
Systems and methods are therefore desirable to manage user assisted ranking for document relevance recommendations and searching.
It is with respect to these and other considerations that the disclosure made herein is presented.