This disclosure relates generally to online content systems, and more specifically to pre-filtering image content items to detect content that violates a content policy of a digital content system.
Digital distribution channels disseminate a wide variety of digital content including text, images, audio, links, videos, and interactive media (e.g., games, collaborative content) to users. Users often interact with content items in a content system, e.g., a digital magazine, provided by various sources, such as social networking systems, online publishers and blogs. However, some of the content items curated and displayed by a content system may be considered illegal or inappropriate for viewing in a working or professional environment or by a particular group of viewers such as young viewers under certain age. Images and/or text that contain nudity/pornography, child pornography, offensive language or profanity are commonly considered to be not safe or appropriate for users in a content system.
Detecting digital content that violates a policy of a content system such as images of child pornography can be computationally expensive. For example, some existing solutions rely on computationally expensive verification services, e.g., MICROSOFT™ PHOTODNA® service, which computes a hash that uniquely identifies an image and compares the computed hash with reference hashes for detection. Additionally, verifying image content for particular types of content requires a large amount of network bandwidth to upload the image content to the verification service, which can clog network traffic of a content system. An effective digital content management system requires the ability to efficiently identify inappropriate content and to take remedial actions on the identified content.