1. Field of the Invention
The present invention relates generally to methods of detecting pornographic images transmitted through a communications network, and more particularly to a detection method wherein pixels of a questionable image are compared with a color reference database and an area surrounding a questionable image is subjected to a texture analysis, and images with questionable areas are subjected to a shape analysis.
2. Description of the Prior Art
A variety of methods have been used to deter the display of “objectionable” images in a work site. “Pornographic-free” web sites, such as sites targeting families and children have been set up for shielding children from viewing objectionable material. Although a particular site may be pornographic free, and considered acceptable for access by children, it is still possible to gain access to an objectionable web site by starting from an acceptable site. Software applications and Internet services such as Net-Nanny and Cyber-Sitter were created and marketed to help parents prevent their children from accessing objectionable documents by blocking access to specific web sites. One type of protective software is designed to store the addresses of objectionable web sites, and block access to these sites. Another type of protective software blocks access to all “unapproved” sites from within a limited selection of sites. These approaches are not highly effective because it is a practical impossibility to manually screen all of the images on all of the web sites that are added each day to the web. They rely on either storing a local database of website URLs, or referencing the database on the Internet. Many next-generation Internet terminals for the consumer market have limited local storage capability and cannot store the database locally. Where the database is referenced on the Internet, there are two disadvantages: (i) the database must be referenced before each Web page is displayed, causing a significant delay to the display of web pages on a browser and (ii) there is a significant increase in the network bandwidth used by such an Internet terminal because of these database lookups. Various algorithms have been investigated for use in detecting objectionable media. For example, algorithms have been tested for use in recognizing shapes, such as people in general, and specific body parts. A detailed summary of work done with algorithms is found in David A. Forsyth and Margaret Flich, Finding Naked People, Journal Reviewing, 1996 and Margaret Flich, David A. Forsyth, Chris Bregler, Finding Naked People, Proceedings of 4th European Conference on Computer Vision, 1996; and David A. Forsyth et al., Finding Pictures of Objects in Large Collections of Images, Proceedings, International Workshop on Object Recognition, Cambridge, 1996.
In order for an algorithm to be useful for screening objectionable images, it is necessary for the algorithm to achieve a very high ratio of the number of objectionable images correctly identified to the total number of objectionable images in a database. This ratio will be referred to as the “recall”, or otherwise referred to as positive identification. In addition, in order for a system to be useful, it should not mis-classify non-objectionable images and therefore generate what is referred to as “precision” or “false-alarm”.
A perfect system will have full positive identification (100% of images that are suspicious will be flagged) and 100% precision (no images that are not objectionable will be flagged). Of course, no system can be perfect. It is therefore a balancing act to try and maximize the positive identification while not over loading the system with false alarms. However, it is important to note that when only a small fraction of the images are objectionable, it is highly important to maximize positive identification, even if the false alarm percentage increases.
One algorithm system reported by Forsyth had a 43% recall with a 57% precision. According to this report it took about 6 minutes of analysis per image to determine if an image pre-selected by a skin filter was an image of a person. In perspective, for a web site that handles 100,000 images per day, such percentages may mean that many images may not be detected and therefore, the algorithm may not be useful.