1. Field of the Invention
This invention relates to computing systems and, more particularly, to identifying undesirable sources of computer-accessed content according to usage patterns associated with that content.
2. Description of the Related Art
As the reach and accessibility of computer networks such as the Internet increase, the amount of information accessible via such networks has grown exponentially. For example, as commercial enterprises increasingly embrace electronic commerce techniques, numerous websites offering information and purchasing opportunities for various products and services have appeared. Major media outlets commonly provide web-based versions of content previously available only through print or broadcast channels, and in some instances generate considerable volumes of content exclusively for web-based distribution. The reduction of cost, complexity and other barriers to entry into web-based content publishing has also facilitated the generation and dissemination of content by individual creators.
While reducing barriers to accessibility of network-accessible content, or simply network content, has encouraged the development of useful content, malicious actors have also capitalized on such accessibility. Numerous web sites purporting to offer legitimate content instead surreptitiously cause users to unwittingly install malicious software such as spyware, viruses, worms, or keystroke loggers. Phishing sites masquerade as legitimate business sites and attempt to lure users into disclosing sensitive information such as passwords, account numbers or social security numbers. Other sites may offer content that, while not malicious, may be offensive or otherwise objectionable to some users.
As the amount of network content increases, the difficulty of locating content that is of general or specific interest also increases, as does the difficulty of distinguishing desirable content from undesirable content. Unlike libraries, which may employ standardized systems of content classification such as the Library of Congress System or the Dewey Decimal System, no standard for organizing and representing web-based content exists. Numerous search engines have evolved to attempt to index web pages according to the page contents (e.g., as given by the textual content actually displayed by the page when loaded into a browser or client, or by concealed metadata such as tags associated with or embedded within the page). However, there is no assurance that the information according to which a page is indexed accurately reflects the nature of its content. For example, a malicious actor might seed a web page with innocuous keywords in an attempt to have the page indexed according to those features in various search engines, while at the same time embedding malicious or otherwise undesirable content in the page. Users may be led to the web page based on seemingly-germane search engine results, resulting in their exposure to undesirable content.