The field of the invention relates to computer systems and computer networks, and more particularly, to systems and methods for categorizing content of computer and network traffic.
Many organizations face the challenge of dealing with inappropriate content, such as email spam, misuse of networks in the form of browsing or downloading inappropriate content, and use of the network for non-productive tasks. Many organizations are struggling to control access to appropriate content without unduly restricting access to legitimate material and services. Currently, a common solution for blocking unwanted Web activity is to block access to a list of banned or blacklisted web sites and pages based on their URLs. However, such approach may be unnecessarily restrictive, preventing access to valid content in web sites that may contain only a limited amount of undesirable material. Also, the list of blocked URLs requires constant updating.
Many email spam elimination systems also use blacklists to eliminate unwanted email messages. These systems match incoming email messages against a list of mail servers that have been pre-identified to be spam hosts, and prevent user access of messages from these servers. However, spammers often launch email spam from different hosts every time, making it difficult to maintain a list of spam servers.
It would be desirable to categorize network traffic content, and prevent undesirable network traffic content (e.g., content that belongs to an undesirable category) to be passed to users. Currently, many content detecting systems use human based categorization to categorize network content. In such systems, an operator manually analyzes network content, then uses the results of the analysis to categorize the network content. Although such techniques may produce reliable results, they are labor intensive and time consuming.
In another technique, HTML links are analyzed to determine a characteristic of network content. However, such technique may erroneously mischaracterize network content. Companies have also used other techniques for characterizing network content, but each of these techniques may not produce reliable result.
Accordingly, new systems and methods for categorizing content of computer and network traffic would be useful.