There is a need to categorize information for various different purposes. For example, there is a need to categorize webpages and/or websites in order to refine search results from searches for information within a body of webpages and/or websites.
Other needs for the categorization of information include, for example, the need to categorize e-mails and other sources of information which may be pushed to a user through a computer operated system (e-mail being just one example and others including social media services, and the like). The user may receive a large amount of information from a variety of sources in this manner. This information may include information in which the user has an interest but also information in which the user has no interest.
For example, it is common for a user to receive a variety of e-mail advertisements for various different shops and services. Some of which may be of interest but many of which are likely to be speculative and of no interest to the user.
Furthermore, the information may include malicious or illegitimate information items. These may, for example, be information items (e.g. e-mails) which direct the user to a particular webpage and attempt to trick the user into entering confidential or personal information (such as bank details and the like)—e.g. a so called Phishing attack. The operator of the webpage may then use that information to gain access to a secure service associated with the user (e.g. an online banking portal) through which the operator can then cause the user damage (e.g. transfer funds out of bank accounts etc). The information item could, for example, encourage the user to contact another person who will then engage the user in an attempt to cause the user damage (e.g. to convince them to transfer funds to them under false pretences). The information item could, on the other hand, provide a legitimate service but in relation to an illegal or disreputable product or service (e.g. the online sale of prescription medication or medication with approval, etc).
Many users would like to be able to filter information which is of no interest to them or which is potentially malicious, so that they are only presented with the information which is potentially of interest to them and/or not malicious in nature. As a result of this need, so called spam filters have been developed. These filters attempt to remove information which is potentially harmful or potentially of no interest to the user, from the information which is presented to the user. This reduces the risk of the user suffering damage as a result of malicious information items and reduces the volume of information items, so that the user can more easily see the information of interest.
The origin of some information may allow a system to categorize the information and to use the categories associated with the information to determine whether or not that information is of interest to a user and/or potentially malicious.
There is also a need for service operators to identify potential users of their services and to target those users with advertisements and the like. Clearly, if a service directs advertisements to users who are most likely to be interested in their service, then their advertisements become more effective. As a result, there is also a need to categorize users.
Current methods for categorizing information are generally either computationally expensive or overly simplistic and are prone to error.
Embodiments of the present invention to seek to ameliorate one or more problems associated with the prior art.