1. Field of the Invention
This invention relates to computing systems and, more particularly, to identifying the level of interest in computer-accessed content according to access characteristics associated with that content.
2. Description of the Related Art
As the reach and accessibility of computer networks such as the Internet increase, the amount of information accessible via such networks has grown exponentially. For example, as commercial enterprises increasingly embrace electronic commerce techniques, numerous websites offering information and purchasing opportunities for various products and services have appeared. Major media outlets commonly provide web-based versions of content previously available only through print or broadcast channels, and in some instances generate considerable volumes of content exclusively for web-based distribution. The reduction of cost, complexity and other barriers to entry into web-based content publishing has also facilitated the generation and dissemination of content by individual creators. This phenomenon is perhaps best illustrated by the increasing number and popularity of individually-authored web logs or “blogs,” which offer content in a wide range of topics, styles and perspectives ranging from objective journalism to near-real-time autobiography.
As the amount of online content increases, the difficulty of locating content that is of general or specific interest also increases. Unlike libraries, which may employ standardized systems of content classification such as the Library of Congress System or the Dewey Decimal System, no standard for organizing and representing web-based content exists. Numerous search engines have evolved to attempt to index web pages according to the page contents (e.g., as given by the textual content actually displayed by the page when loaded into a browser or client, or by concealed metadata such as tags associated with or embedded within the page). Such search engines have further attempted to qualify the relevance of a given indexed page using other features of the page, such as its age and/or the number of links to the given indexed page from other indexed pages. For example, for a given keyword search, a page that satisfies the search criteria and is linked to from many sources may be considered a more relevant search result than a page having fewer external references.
Conventional index-based approaches to organizing online content suffer from a number of limitations. For example, such approaches are relatively static. Typical search engines gather information for indexing by “crawling” through web pages over periods of days or weeks, which may be insufficient to capture fast-moving or transient content. Further, numerous sources of content may be excluded from the indexing process, rendering the excluded content inaccessible to users of that search engine. For example, content hosts may deliberately refuse access to web-crawling tools, or a host may simply be too new or insufficiently relevant (e.g., according to absolute number of visitors or number of inbound links to content) to warrant indexing according to a search engine's indexing policy or strategy. Thus, users unaware of how to directly access excluded content (e.g., via a specific Uniform Resource Locator, or URL), may never be able to locate it.
Additionally, conventional approaches for determining the relevance of content may not correlate well with the actual usage of content. As described above, a given web page that is conventionally indexed may be assigned a high degree of relevance if there are a large number of links to that page from other pages. However, if few users actually navigate those links to reach the given page, the significance of the links in determining relevance is questionable. In fact, it is a commonplace tactic to distort the overall relevance of a particular web page by widely distributing specious links to that page across the Internet, thus elevating the ranking of the page within search results despite content that might not otherwise justify such a ranking.