The Information Age has been characterized by the explosion of the internet. The phenomenon of the internet has given rise to an increase amount of data being exchanged and shared electronically. With the internet, users have access to potentially endless source of information. Today, information may be exchanged through multiple digital and electronic modes, including but are not limited to emails, blogs, web pages, and the like. However, as the internet provides more connectivity, users have become an easier target for unsolicited information.
Misuse of the internet may come in many forms. Spamming is an example of how the internet may be employed to bombard internet users with unsolicited information. In an example, web advertisers may indiscriminately send unsolicited email messages to unsuspecting email account holders to promote products. Spamming has become a serious problem resulting in unnecessary time and resources being dedicated to blocking and filtering the unsolicited messages.
In an attempt to address spamming, users may implement anti-spam programs. Anti-spam programs may vary in techniques. Some anti-spam programs include blocking the incoming messages. In an example, messages sent from pre-defined web sites known for spamming may be blocked. Anti-spam programs may also employ filtering technique. In other words, the anti-spam programs may have intelligence to automatically analyze the incoming messages to determine whether the incoming messages are spam messages. In an example, content (e.g., specific words, specific phrases, etc.) of an incoming message may be analyzed to determine if the incoming message is actually a spam message. In another example, rules may be established by analyzing a plurality of spam messages to identify patterns.
In general, anti-message programs have produced inconsistent results and may result in a high number of false positives. Some anti-spam programs are unable to catch all spam messages. Other anti-spam programs may unintentionally block legitimate incoming electronic messages. The techniques employed by anti-spam programs may be a time consuming process that may require constant updates. As a result, most anti-spam programs may be unable to anticipate changes to spam attack and usually require time to incorporate the new spam attack technique before releasing the updated anti-spam programs to the general public.
Besides spamming, users may also experience spamdexing. In spamdexing, users of search engines may receive search results that are misleading, such as search results that are unrelated to the search terms/phrases, for example. The search results usually try to rank the web pages in order of relevancy. Due to spamdexing, some of the web pages being displayed with high relevancy may have little or no information related to the search term. Consider the situation wherein, for example, a user wants to search for web sites that offer information about tennis. Due to spamdexing, the search results that may be returned may include web sites that do not pertain to tennis, such as pornographic sites, for example.
Spamdexing may be implemented by different techniques. In an example, spamdexing may include a plethora of words as meta keywords. In another example, spamdexing may include hidden contents. In yet another example, spamdexing may include URL (uniform resource locator) redirection. Although the techniques may vary, the purpose of spamdexing is usually to increase the relevancy of a web site in a search result.
The techniques that may be employed to identify spamdexing may be similar to the anti-spam techniques discussed above. In an example, search engines may analyze the content of the web page to determine if spamdexing has been employed. In another example, search engine may identify patterns among a group of web pages to establish rules about spamdexing. Identifying spamdexing may be a long and tedious process that may require time and resources to identify and update. Thus, search engines have continued to fight an uphill battle in identifying spamdexing and maintaining the relevancy of their search results.