The Internet is increasingly used as a platform for social media. Web logs (blogs) and wikis are two common forms of social media. However, more generally social media may also include interactive aspects, such as voting, comments, and trackback and take many different forms. Referring to FIG. 1A, social media generally describes online technologies and practices that people use to share opinions, insights, experience and perspectives with each other. Examples of social media include social networks, blogging systems, media sharing platforms, online forums, and meme aggregators.
Social media is based on widely available tools that provide users the ability to create links and trackbacks that tend to foster and describe their trust relationships. There are several aspects of social media that foster trust relationships. One aspect of social media that fosters trust relationships in social media is the level of dedication of individual publishers. Publishing social media content is an expression of unique interest in a topic. Individuals participating in a conversation around this content invest time to read, trackback, tag, rate, and/or comment on what is being shared. The level of dedication of the publishers of social media and individuals participating in conversation around it is one factor that promotes trust within social media. The trust relationships also develop due to the ability of individuals participating in a conversation to comment about postings to add context and correct errors. Additionally, social media permits links to be established between publishers. The links between publishers foster the spread of ideas and also permits rapid feedback within the community. Moreover, in social media influential and/or trusted publishers and other participants in the conversation can lend their weight to the veracity of the postings of other publishers, via links, comments, voting and the like. In the blogosphere, for example, an influential blogger can include links in a posting to other blogs, which increases the influence of the linked blog post on a discussion.
One aspect of social media is that it is highly conversational in nature. As used in this patent application, an individual conversation in social media is a networked discussion about a specific topic between social media publishers. A conversation can also include an interaction between at least one social media publisher and conventional online media, such as an online news source like CNN. A conversational network is comprised of the individuals, sites, and pages participating in online discussions about all topics. A conversation within the network is about a specific topic. An individual publication corresponds to a post that is a single piece of media that can be located by a permalink and which may also contain additional links. An individual publisher is a person or entity that posts social media (e.g., the person or entity associated with one or more permalinked posts).
FIG. 1B illustrates a hypothetical example of how a conversation can flow within social media and also interact with conventional online mainstream media and corporate media. In the example of FIG. 1B, an illustrative example is that of a problem with a laptop battery. In social media the links between publishers within the social network permit different publishers to post Web content, provide comments, and post links. As a result, a conversation about a topic can flow and be amplified through the social media and also interact with conventional online media. In the example of FIG. 1B, a publisher in a social network 150 can vouch for the veracity of a posting of a blogger 152, increasing the level of trust in the story posted by blogger 152. Blogger 152 can include a link to another site, such as a media sharing website 154 having a video clip of the laptop battery problem and also to a corporate media website 153 having additional information about the problem. An online forum 156 may have a favorable comment about the video clip and include a link to the media sharing website 156 along with another link to mainstream online media 158 posting the same clip. In this example, a Meme aggregator 166 may also have a link to online mainstream media 158. In the example of FIG. 1B, some of the aspects of trust relationships can be observed such as publishers making comments supporting the veracity of the postings of others, publishers making comments to correct errors, and publishers providing links to other publishers within social media and to conventional online mainstream media 158 and corporate media 153.
Conventional Internet search tools have proven inadequate for examination of conversations within social media in terms of understanding the interactions within a dynamic conversation. Conversations in social media can propagate and amplify with astonishing speed. However, the information destination-oriented implementation of conventional Internet search engines does not permit many characteristics of conversations in social media to be adequately understood.
A traditional Internet search engine has a crawling strategy for indexing a broad cross-section of the Internet likely to be of interest to general purpose users. Search engines typically generate results for a query that are described as relevant based on the search criteria and distributed on a curve from “most relevant” to “least relevant,” which can be drawn on a relevancy curve, as in FIG. 1C. Thus as a hypothetical example, consider again the example of FIG. 1B. If a user inputs a search query into a conventional search engine with query terms “Apple Laptop Exploding” they might receive 500,000 hits ranked by relevance. A conventional search engine would present a relevant result by seeking pages on which the search term occurs most frequently and also take into account some other relevance factors to rank the hits. Google's Page Rank algorithm, for example, concatenates the number of sites pointing to each page with relevant search terms to identify the site most pointed to by the greatest number of sites with high numbers of inbound links, using those pointers as a proxy for reliability of the data on the page. If so many other sites point to the page, it must be the most correct result for the search, the reasoning goes. This approach skews results to the top of the power curve in FIG. 1C giving sites that produce large numbers of articles and which are pointed to by other sites a disproportionate influence on the results, often long after the site stops producing new relevant content. Thus, for example, referring again to the hypothetical example of FIG. 1B, a conventional search engine might give a disproportionate relevance to old articles about laptop batteries.
Another problem of the conventional search engines is that they can be gamed. Consider, for example, the Google search engine. Google is primarily a ranking of web pages based on volumetric analysis. Google's Page Rank calculates the rank of information on a page in response to a search query by concatenating the number of explicit links from other pages associated with the search topic to an undisclosed number of degrees (pages pointing to other pages through a Uniform Resource Identifier, or “URI”), the concept of authority in information has been built on the volumetric notion that the greater the number of links pointing to a given page the more likely it is to be correct. This approach can be gamed by launching sites that point to a page in order to raise its authority (hence, Google must constantly adjust its indexing algorithms to prevent gaming) and suffer from historical skewing-sites. Volumetric determination of authority is prone to many errors and can be skewed by many factors that do not contribute to the user's understanding of how the information reached its current form and authority.
There are various modifications of conventional search engine technology that have been proposed. For example, search engines have been developed which examine popularity of links by timeframe. Determining the popularity by number of links pointing at a page within a given timeframe, such as two week or a month from the current data, limits historical skewing. However, this improvement is still inadequate to understand a conversation in social media. The number of links within the given time frame may be general, including all links to a site, and topic-specific, including just links that deal with a target search phrase. As a consequence, sites which have general links will be over-weighted, and as a result will drown out topic-specific conversation.
Conventional search engines also have another limitation in that they typically do not completely index social media. That is, the index in a conventional search engine does not capture sufficient information to properly represent and/or analyze a conversation. Conventional search engines are designed as general purpose engines to search the entire Web and have crawling policies that typically do not adequately index social media. One limitation is that conventional search engines rely on crawling of sites directly or capturing new information via Really Simple Syndication (RSS) feeds to generate indices, which limits the reach of search in several important ways.
First, one limitation of conventional crawling is that recency overwhelms context. No Web index is complete, the best represent perhaps 20 percent of the information on the Web, because the contents of pages must be captured by crawling sites from home page through the last archive page in order to be comprehensive. Because of limited resources and the more general focus of most search indices, crawls tend to cover only a part of the total contents of many Web sites; a crawler, for example, may only look at pages that are three pages below the home page of a site. Since the most recent information tends to reside on archival pages that may be more than three links deep on a site, a site's coverage of a topic will be judged only on the content of the most recent postings rather than the entire body of work the site represents, which underweights sites that are deeply focused on a few narrow topics, such as “IT Management” or “Legal Practice” when other sites become interested in those topics over a short period of time.
Second, another limitation of conventional crawling is that social media often limits the comments exposed through RSS, which means that conventional crawlers may not adequately index social media. In particular, few blogs expose their comments through RSS and those that do tend to separate the comments from the RSS feeds of main postings, eliminating or making far more difficult the analysis of comments in relation to topics discussed on the site. This undercuts the indexer's ability to track cross-linking of discussions within comments and minimizes the role of communities that exist around particular sites when measuring the discussion of topics.
Third, another limitation of conventional crawling is that there is a ping dependence. Indices that rely solely on RSS feeds depend on bloggers and publishers to “ping” the index server (that is, which send an Extensible Markup Language Remote Procedure Call (XML-RPC) command asking the index to review recent changes on the target site). Because there are many such indices and more appearing all the time, pinging has actually fragmented the market and forced search companies to form a coalition to share pings, distributing updated posting information to all members. Ping-based systems that are not supplemented by direct crawls of sites do not successfully capture all activity on and around sites in networked conversations.
The various drawbacks of conventional search tools severely limits the capability of individuals to analyze conversations in social media. At one level, conventional search engines will often produce too many hits. For example, a conventional search engine, such as Google, may produce millions of hits from a simple query in which a few search terms are input. On the other hand, a conventional search engine may fail to identify many web postings, due to the previously described problems associated with RSS feeds and the fact that conventional search engines index only a fraction of the Web.
An even more serious weakness of conventional search engines is that a conventional search engine does not provide information directly relevant to understanding the dynamics of a conversation in social media. In particular, the prior art search technology does not provide a capability to understand how conversations in social media are influenced and does not provide an understanding of potential trusted points of entry into a conversation.
Therefore, in light of the previously described problems, the apparatus, method, system, and computer readable medium of the present invention was developed.