Online systems, such as social networking systems, allow users to connect to and communicate with other users of the online system. For example, an online system may allow users to share content with other users of the online system by providing content items to the online system for presentation to other users. In addition, content providers may submit content items to the online system to be provided to users likely to interact with them. Often content items include text data, as well as image data, audio data, video data, and/or any other type of content that may be communicated to a user of the online system.
To ensure a high-quality user experience, an online system may remove or prevent certain content items from being displayed to users of the online system, based on text data associated with each content item. The presentation of content items within the online system may be restricted by one or more policies, for example a policy that disallows content items having text associated with certain categories of content (e.g., adult content, illegal content, and/or the like).
The online system may maintain a review process to identify instances of content items including text that violates one or more policies, and are thus ineligible for presentation to users. Conventional systems require human reviewers to manually review content items received from content providers to determine their eligibility for presentation. However, as the number of content providers using the online system increases, so does the number of content items to be reviewed, for example hundreds of thousands of content items in a few days or a week. Existing attempts to automate the review process, for example, searching for offensive keywords are often unable to identify complex policy violations, require a large amount of processing time to review the hundreds of thousands of content items and a large amount of storage space in computer memory. Therefore, conventional techniques for identifying content items that violate policies of the online system are ineffective, expensive, and time-consuming.