1. Field of the Invention
This invention relates to computing systems and, more particularly, to identifying keyword relationships among sources of computer-accessed content according to usage patterns associated with that content.
2. Description of the Related Art
As the reach and accessibility of computer networks such as the Internet increase, the amount of information accessible via such networks has grown exponentially. For example, as commercial enterprises increasingly embrace electronic commerce techniques, numerous websites offering information and purchasing opportunities for various products and services have appeared. Major media outlets commonly provide web-based versions of content previously available only through print or broadcast channels, and in some instances generate considerable volumes of content exclusively for web-based distribution. The reduction of cost, complexity and other barriers to entry into web-based content publishing has also facilitated the generation and dissemination of content by individual creators. This phenomenon is perhaps best illustrated by the increasing number and popularity of individually-authored web logs or “blogs,” which offer content in a wide range of topics, styles and perspectives ranging from objective journalism to near-real-time autobiography.
As the amount of online content increases, the difficulty of locating content that is of general or specific interest also increases. Unlike libraries, which may employ standardized systems of content classification such as the Library of Congress System or the Dewey Decimal System, no standard for organizing and representing web-based content exists. Numerous search engines have evolved to attempt to index web pages according to the page contents (e.g., as given by the textual content actually displayed by the page when loaded into a browser or client, or by concealed metadata such as tags associated with or embedded within the page). Such search engines have further attempted to qualify the relevance of a given indexed page using other features of the page, such as its age and/or the number of links to the given indexed page from other indexed pages. For example, for a given keyword search, a page that satisfies the search criteria and is linked to from many sources may be considered a more relevant search result than a page having fewer external references.
Conventional index-based approaches to organizing online content suffer from a number of limitations. For example, such approaches are relatively static. Typical search engines gather information for indexing by “crawling” through web pages over periods of days or weeks, which may be insufficient to capture fast-moving or transient content. Further, numerous sources of content may be excluded from the indexing process, rendering the excluded content inaccessible to users of that search engine. For example, content hosts may deliberately refuse access to web-crawling tools, or a host may simply be too new or insufficiently relevant (e.g., according to absolute number of visitors or number of inbound links to content) to warrant indexing according to a search engine's indexing policy or strategy. Thus, users unaware of how to directly access excluded content (e.g., via a specific Uniform Resource Locator, or URL), may never be able to locate it.
Moreover, static indexing approaches that focus solely or predominantly on indexing content may overlook other possibly useful sources of information about content, such as patterns of user behavior with respect to content. Such patterns may emerge dynamically and in real time as users interact with one another and are influenced by factors internal and external to the content with which they interact.