Digital sensemaking is sensemaking mediated by a digital information infrastructure, such as the Worldwide Web (“Web”). Through the Web, users can access both “traditional” Web sites that post information from diverse sources and interactive Web sites, including moderated Web logs or “blogs,” user forums, and Web sites with voting, which allow users to actively rank new information.
As a digital information repository, the Web continually evolves as events occur, ideas get synthesized, and new trends emerge. New information is posted continuously. Mainstream media Web sites generally cover popular topics, such as news, business, politics, sports, entertainment, and weather, but a host of additional topics exist through other Web sources. These topics range from slightly less popular topics, for instance, technology news, to specialized or obscure topics that are relevant to a comparatively small number of people, such as evening class schedules for a local community college.
The demand for items in many markets follows a “Long Tail” distribution, such as described in C. Anderson, The Lone Tail: Why the Future of Business is Selling Less of More, (Hyperion Press) (2006), the disclosure of which is incorporated by reference. FIG. 1 is a graph showing, by way of example, a hypothetical long tail distribution 10 for digital information. The x-axis represents digital information and they-axis represents popularity level. Items appearing at the head of the distribution 11, although few in number, enjoy the greatest popularity, such as media stories falling into a small number of popular categories. However, more items along the “long tail” 12, which cover niche topics with smaller readerships, outnumber head items 11. Although any single head item 11 enjoys greater popularity than any one of the long tail items 12, the aggregate popularity of a large enough group of long tail items 12 will exceed the popularity of all head items 11 when enough long tail items 12 are included, which implies that a larger overall audience could be reached by focusing on long tail topics, provided the audience can be made aware of them.
Consumers of information have only a limited amount of time and cannot pay attention to everything. As more topics become available, mainstream topics receive a shrinking fraction of readers' attention. Analogously, prime time television audiences are currently shrinking, as cable and satellite networks improve their programming and increase their viewership. Similarly, musical “hits” today sell fewer copies than sold a decade ago, as more choices and purchasing options become available. The economics and popularity trends from these observations can be succinctly summarized: “if you give people choices, they take them” and “the head of the distribution is shrinking.”
The problem is not only finding new or popular information: the problem is finding new information that is relevant to a user's specific needs, that is, new information on the “long tail.” Existing approaches fall short. Web search engines, for example, passively retrieve Web content in response to user queries and frequently favor old information. The Goggle search engine, for instance, is based on the Page Rank algorithm, which depends on inter-page hyperlinks to estimate authoritativeness and popularity. Web pages that are most cited by other Web pages are assumed best, yet may not actually be the most relevant.
Similarly, online news services are frequently aligned with mainstream media sources, which group news into a handful of popular topics, although specialized topics are sometimes available through syndication feeds. Online news aggregators correspondingly provide consolidated summarizations of news from multiple sources, but often fail to coherently group news under appropriate topics or categorize news into fine grains. As a result, readers are faced with a confusing blend of articles on disparate topics whenever they try to follow a story or topic. A reader may begin by reading articles under a technology topic to follow a new computer phone. However, the phone articles may end up mixed in with other technology articles and be scattered across the news aggregator's Web site. Searching for phone articles by keywords also may not correctly match all relevant articles. Thus, online news services and news aggregators lack sufficient granularity to enable the reader to only receive the best and most relevant articles delivered in a way that facilitates easily following developments on a topic.
Finally, news Web sites with voting invite users to vote on news stories. The highest ranking content is promoted to the front page, such as through the Digg Web site. Digg categorizes articles into a handful of topics, which each use different front page promotion algorithms. Only articles that have received sufficient “diggs” appear on a front page and only registered users can submit, comment on, and promote articles. The topics consequently reflect popular topics at the head of a long tail distribution. Voting has been criticized as susceptible to collusion, suppression, and paid promotion, such as described in C. Mezel, “The Digg Algorithm—Unofficial FAQ,” SeoPedia, (Nov. 2, 2006); N. Patel, “There's More to Digg Than Meets the Eye,” Pronet Advertising, (Jan. 15, 2007); and J. Dowdell et al., “Digg's Kevin Rose on Recent Indiggnation Fact vs. Fiction,” Marketing Shift, (Sep. 7, 2006), the disclosures of which are incorporated by reference.
Therefore, a need remains in digital sensemaking for discovering new, relevant, and authoritative digital information that is automatically categorized within topics for a particular subject area and emphasized at a personal level.