There was a time when the number of broadcast sources that an individual could use for obtaining information could be counted on a short list. However, when the Internet was added as a vehicle for providing information, many more sources of information that might have existed previously but unavailable to a given user, such as an out-of-town local newspaper, became readily available. In some sense that was still manageable in a way. For example, while the out-of-town local newspapers moved their content online and might have sent it to individual users, that still was manageable, as any given user would only be interested in a few towns—such as the town someone grew up in—and that newspaper had an editorial function that ensured that only articles of general interest were published.
Social networking and information broadcasting sites are now prevalent and widely accessed by Internet users. Significantly, a social network can involve communication emanating from millions of users and this is, in most cases, too much for any one reader to handle, absorb or use. As a result, there is a need for mechanisms to control the deluge of possible information.
Sometimes, the information is filtered in a “request” manner, i.e., while there might be billions of pages of information available to a user, the user does not have to deal with all of those pages because the user selects a specific page, uses a search engine to identify a specific page to request or other mechanisms are used so that the user's request is for a specific piece of information. For example, the user can type in the URL (Uniform Resource Locator) from the user's list of bookmarked URLs for the page specific to tomorrow's weather in the user's local town or the page specific to events related to a specific celebrity.
However, at other times, social networks and other tools provide the information to the user in the form of streams of messages. Examples of such message streams include e-mail, instant messages, mobile phone calls, SMS (Short Message Service) messages, and/or the like. Such messages are broadcast from a source to destinations or sent from one source to one destination. In the general case, messages originate at a source and are received at a destination if that source and that destination are linked in a message graph.
In some cases, the source is an individual writing a message assumed to be of interest to the destinations that are linked in the message graph to that source, but the source can also be a business entity, government entity, organizational entity, and/or a computer entity (examples of the latter being hardware and/or software running a program that determines what messages to send and when—often useful for automated alerts triggered by computer programming).
While not explicitly spelled out, there is an electronic component that actually sends the message. For example, while it might be said that “celebrity movie star C.M.S. sent a message announcing her presence at a fashion show” it is more typically that C.M.S. caused some electronic device, such as their smart phone, to generate a message they typed in and pressed “Send.” Thus, in typical parlance, saying that a person sent a message typically implies that some electronic device generated the message and sent it into a networked environment, e.g., one were servers know that when a message is received from a particular source (or appearing to be received from a particular source), it is forwarded and/or replicated and forwarded to destinations according to a message graph. Likewise, at the destination side, there are users (who can be individuals, entities and/or computer elements) that receive messages on destination devices.
One such messaging service is operated by Twitter™, which offers a service by which members can broadcast content in 140 character chunks known as “Tweet” messages to anyone in the Twitter™ community. Individual members can choose which user feeds to subscribe to, resulting in a type of information stream that suits the tastes, interests/topics that the user is interested in. In the Twitter™ system, destinations are devices (cell phones, web browsers, Twitter™ apps, etc.) that receive Tweet™ messages and likewise the sources are devices that push Tweet™ messages into the Twitter™ system. The destination and/or source devices can be cell phones with SMS capability, devices with web browser capability, devices that can run specialized Twitter™ apps, or the like.
Twitter™ maintains a message graph mapping sources and destinations. For each edge in the Twitter™ message graph, the destination is said to be “following” the source (sometimes referred to as the “followee”). In other words, if user A “follows” user B, then when user B posts a Tweet™ message, it is provided to user A's list of Tweet™ messages. The graph is a directed graph, i.e., user A following user B does not necessarily imply that user B follows user A. Twitter's™ message graph is colloquially thought of as the lists of everyone's followers.
Another example is the message wall provided to users of Facebook™. Yet another example is comment boards that allow users to post messages and respond to posted messages. Similar considerations are found in multi-media content provider systems which attempt to introduce media (e.g., a new song or movie for example) to users based on consumption habits of other users in the community. Internal knowledge systems which allow employees to enroll and receive selected emails from other co-workers on particular topics are yet another.
Additional references in this area, which are incorporated by reference herein, include:
United States Patent Application 20100299432 to Dotan—directed to managing user information streams.
United States Patent Application 20110029636 to Smyth which discloses a real time information feed system.
United States Patent Application 20110153646 to Honq which is a system for triaging information feeds.
United States Patent Application 20110252027 to Chen which is directed to recommending interesting content in an information stream.
United States Patent Application 20110093520 to Doyle which automatically identifies and summarizes content published by key influencers.
A common problem in these kinds of information following systems of course is the fact that users (particularly new users) are challenged to identify appropriate content sources to follow for the topics they are interested in. Twitter™ has addressed this problem, in part, by creating/assembling their own “lists” of entities that they deem most suitable/appropriate for certain categories of content. For the most part, however, these lists tend to be dominated more based on the celebrity status of the entity, and less so on the actual useful information contributed by the entity in question. Twitter™ also lets users make their own lists of people to follow, and one can review and “mine” the lists of others for leads as well. However, in the end, this just pushes the problem again to the end user to find and identify content of interest.
Generalized recommendation engines are known. For example, U.S. Pat. No. 5,583,763 entitled “Method and Apparatus for Recommending Selections Based on Preferences in a Multi-User System” disclosed that music purchaser selections could be recommended to one user based on a commonality of prior purchases between that one user and other users, for example, recommending song S1 to user U1 because user U2 bought many songs in common with user U1, but user U2 also bought song S1 and user U1 has not yet bought song S1.
Follower recommendation systems might perform a similar action with respect to users and who each of them follows, but still there can be a tendency for lists to become nothing more than popularity driven, in that the same sources will appear all the time on every list without regard to their actual utility to the user/topic. In addition, early users/adopters tend to be rewarded beyond their real value since they will artificially appear in successive lists without regard to their contributions.
The problem of designating which sources to follow will become even more unmanageable as message services become more popular and users start to “follow” more and more publishers of content. At some level of participation, the user's information stream (and overall experience) becomes degraded by the proliferation of duplicate content. Duplicate content threatens the utility of information streams. If content that is useful and nonduplicative (i.e., information, rather than just bits and bytes of data) to a user is considered signal, and the duplicate, irrelevant and uninteresting (to that user) messages are considered noise, a desirable goal is to raise the signal-to-noise ratio (“SNR”)—of course with something better than requiring the user to manually read and delete the noise or read and scroll through the noise to get to the signal.
Similar problems exist in other fields as well, including social networking sites, internal emailing lists, etc. In fact, where the number of sources can be on the same order of the number of destinations, there can be a problem wherein there is a high percentage of duplicate content and even when sources are suggested, that can result in a high percentage of duplicate content. It can be expected that in any data crawling/aggregation field (including for search engines) the identification and selection of appropriate and optimal content sources is a prime concern. Given a finite amount of time and resources to characterize or identify relevant content for a topic, it is desirable to know to which sources are more likely to have relevant material.
Clearly, there is a need for systems and methods to improve the signal-to-noise ratio in such systems and existing approaches might attempt to do so, but are not sufficient.