The present disclosure relates to an image processing device and an image processing method, and more particularly, to an image processing device and an image processing method capable of rapidly generating a depth image using a depth image of a preceding frame.
Technologies for detecting motions of arms, hands, fingers, or the like of human beings through image recognition using depth images indicating the positions of subjects in depth directions in images generated using images of a plurality of viewpoints with high precision and using the motions as user interfaces of various applications have been suggested.
In such technologies, since a depth image with high precision is necessary, a position in a depth direction is desirably detected in units of sub-pixels smaller than units of pixels. As a method of detecting a position in a depth direction in the units of sub-pixels, for example, there is a method of increasing the resolution of an image of a plurality of viewpoints in a pseudo manner through linear interpolation or the like and detecting a position in a depth direction using the image of the plurality of viewpoints after the increase in the resolution.
There is also a method of interpolating positions in a depth direction in units of sub-pixels based on a function indicating a position detected in the units of pixels in the depth direction. There is also a method of detecting a position in a depth direction in the units of sub-pixels by a phase restriction correlation method (for example, see JP 2013-19801A).
However, when a position in a depth direction is detected in the units of sub-pixels, a calculation amount increases more than when the position in the depth direction is detected in the units of pixels. Accordingly, when detection of a position in a depth direction in the units of sub-pixels is performed on an entire screen, a detection time may increase and a depth image may not be generated at a high frame rate. Thus, in the above-described technology, it is difficult to use motions of arms, hands, fingers, or the like of human beings sufficiently as user interfaces.
On the other hand, methods of easily generating a depth image by setting the position of a subject in a depth direction in a region with a large motion vector as a front position and setting the position of the subject in the depth direction in a region with a small motion vector as a rear position using a motion vector have been suggested (for example, JP 2000-261828A and JP 2012-109788A).