In recent years, we are all facing an explosion of visual information including images and video. Some of this visual information consists of offending content or other content that is inappropriate to watch by some sectors of the population, such as by children.
The need for content moderation has increased in recent years, since the Internet is now available for everyone, even children at young ages.
Existing solutions are based mostly on textual analysis or internet domain filtering. However, with the emerging of Web 2.0 and User Generated Content sites (such as YouTube, FaceBook, Flickr) the traditional solutions are insufficient. In many cases, the visual information is not accompanied with any textual information. In other cases, misleading textual information is inserted deliberately. This renders textual analysis and domain filtering useless against these threats. Other methods such as crowd sourcing and collaborative filtering are inefficient for the long tail of visual information in user generated content.
Section 2: Related and Previous Work
There has been some work in the field of content moderation and filtering based on visual analysis. Traditional methods use mainly simple image features such as skin information (such as in (1)), or combine skin information with texture and color histograms (See for example (2)).
More recent methods uses a stronger model known as the “Bag-of-features” model, where one creates a dictionary of quantized image descriptors (such as Sift (3)) and then statistical tools such as SVM or PLSA are used to learn a model of porn/non-porn images from the dictionaries (4). Some effort has been put in improving the used image descriptors (e.g. (5)), improving the learning method (e.g. —using PLSA as suggested by (5)) and improving the run-time (e.g. by using image features that can be computed very fast, e.g. amount of edges (6)).
These methods share the model/parametric-based approach.