The embodiment relates to an image processing method and an image processing apparatus for detecting a target, and more particularly, to a technology for promptly and conveniently extracting a target such as a hand region by using only information of an image depth, and providing a user interface using the extracted hand region.
In recent years, as a utilizing range and a technique level of display devices increase, interactive devices, such as a game player and a computing device, which can detect and reflect an input of a user in real time have been actively developed. For an input of a user, operation recognizing user interface devices capable of recognizing a motion of a user even if the user does not physically make contact with a device have been developed in addition to buttons, keyboards, mouse devices, and touch screens.
Among them, the operation recognizing user interface device capable of recognizing the motion of the user is based on a technology of manipulating functions of a display screen by recognizing initial region and operation of a user using depth information acquired by a 3D sensing camera, and tracing the user from the initial region to recognize a user operation as an input.
In this technology, a part (an elbow, a wrist, or a forearm) of the body of a user is detected by using 3D sensing information to utilize a motion region of the corresponding part in motion recognition. However, as a technology capable of accurately recognizing a motion and a shape of a hand is required to implement a more precise user interface, a method of detecting only a part of a human body as described above is very limited in extraction of a hand region which needs to be primarily extracted for recognition of motion/shape of the hand.
Most of existing methods for extracting the hand region use a 3D image and an RGB color image as input information. However, when both the two pieces of information are utilized, an amount of information to be processed is vast and an algorithm becomes complex, causing an increase in an amount of calculations and a decrease in operation speed. Further, since a body part and a hand region of the user, that is, a region of interest (ROI) cannot be accurately extracted by using RGB color information only, it is essential to utilize depth information to improve accuracy (reference paper: H. An, and D. Kim, “Hand Gesture Recognition using 3D depth Data”).
The existing technologies for extracting a hand region have not yet mentioned in detail a method of using only depth information of a stereo camera.