There is an imaging device such as a digital camera which shoots a moving image of a subject and detects a motion vector between frames of the moving image, thereby detecting motion of the subject and estimating a motion amount of an entire screen. In such imaging device, it is possible to realize a gesture user interface (UI) function to operate an electronic device by motion of a finger by using the finger as the subject and realize a function to correct camera shake by using the motion vector of the entire screen.
A block matching method is generally used as a method of detecting the motion vector between the frames. The block matching method is the method of detecting a vector between positions on a screen of blocks of two-frame image with minimum difference as the motion vector. Therefore, in the block matching method, when the images the difference therebetween is to be obtained change due to a cause other than the motion, the detection accuracy of the motion vector is deteriorated. For example, when a moving speed of the subject changes temporally and a blur amount of the subject is different between the images of the two frames, difference becomes larger even between the blocks of the same subject and the detection accuracy of the motion vector is deteriorated.
Therefore, a method of detecting the motion vector by using the images of the same frame with different exposure times is considered (for example, refer to Patent Document 1). In this method, the motion vector is detected supposing that the difference between the images of the same frame with the different exposure times is generated on the basis of difference in the exposure time and the motion vector of the subject.
However, in the method in Patent Document 1, when the difference in the exposure time is small, the difference between the images of the same frame with the different exposure times becomes small and the accuracy of the motion vector is deteriorated. On the other hand, when the difference in the exposure time is large, accuracy of a linear equation obtained by approximating each image supposing that the difference between the images of the same frame with the different exposure times is generated on the basis of the difference in the exposure time and the motion vector of the subject is deteriorated and the accuracy of the motion vector is deteriorated.
Also, it is not considered to generate a depth map indicating a depth value of each pixel with high accuracy by using the images with different shooting conditions such as the exposure time.