The technique of detecting a person in an image using a pattern recognition is performed by detecting, for example, a part of the face (for example, the eyes, nose, and mouth), the head, and the color of skin (for example, see Patent Document 1). In the detection of a person of this type, generally, it is determined whether or not the image is of the head and the like by performing scanning in such a manner that an image patch of a predetermined region is finely shifted with respect to image data to be detected (original image). The aforementioned determination is performed, for example, by referring to an image for learning acquired preliminarily.