This invention relates to differentiating between good content and bad content in a user-provided content system based on features identified in the content.
Systems allow users to interact with each other by sending messages to each other. For example, a social networking system allows users of the social networking system to interact with other users via status updates, wall posts, or private messages. Messages comprise data, which represents the content of the messages, and metadata, which represents information describing the messages. The data sent in a message is also called content, and the system that allows users to send messages is called a user-provided content system. Metadata associated with a message can comprise information describing the sender of the message, the recipient of the message, attributes of an interface used to send the message, attachments to the message, a level of urgency or importance of the message, and the like. Some messages are intended to be private messages delivered to either one recipient or a small set of recipients whereas other messages are broadcast messages intended for a large number of recipients. For example, a wall post message in a social networking system can be accessed by all friends of the recipient in the social networking system or even friends of the friends of the recipient subject to the privacy settings of the users.
Messages allow a user to send information to other users. For example, a user aware of an event may send information describing the event to other users. Similarly a user can share an interesting webpage with other users by sending them the uniform resource locator (URL) of the webpage in a message. A user may share an interesting document with other users by sending the document as an attachment to a message. A user may share an application with other users by sending an executable file of the application as an attachment to a message.
Some users represent businesses and organizations that send information associated with the business or organization to users in messages. For example, a business may advertise new products by sending messages to subscribers of certain mailing list. Alternatively, the business may send a different message to each subscriber by customizing the content of each message to the recipient. The number of messages sent by an organization can be significantly larger than the number of messages sent by a user representing a person.
Messages can be sent by malicious users for purposes harmful to other users. For example, a malicious user can send harmful or offensive content to users that never requested the content. The harmful content may comprise executables that could have undesired effect on the recipient's computing device. Malicious users attempt to steal credentials of existing users of the system and send messages from the stolen accounts. The stolen account continues to be a valid account until the system realizes the account is stolen and locks it. This gives the malicious user a window of opportunity to use the account for illegal purposes. Malicious users are likely to use stolen accounts for sending messages since a recipient is more likely to look at a message if the message appears to be sent by an acquaintance. Usually the fact that a message is malicious is determined after the message is delivered to the recipient and a harmful effect of the message has already occurred.