There exists an image (hereinafter referred to as an ortho-image) obtained by virtually observing a specific structure such as a road surface, a wall surface, or an inside wall of a tunnel from a vertical direction. The ortho-image is generated from a taken image. The taken image is taken by an imaging device such as a camera mounted to a moving object such as a vehicle or a robot. The ortho-image is, for example, an image obtained by looking down the object from the heights, and can therefore be used for recognition of a pattern on the road surface such as a stop line or a character. In an image processing device for processing such an ortho-image, it is desired to accurately perform, for example, crack inspection on the road surface.