In the image processing field, image segmentation technology is an important issue as well as a sticking point which is not easily solved. The image segmentation technology has been researched for decades. However, real time, various illuminations, various background environments, and prior knowledge of a user are needed, and moreover, there is no one integrated solution for solving various issues. At present, there are products which operate under only a limited condition.
However, since distributive depth cameras are distributed, it is possible to extract an object such as a person and a gesture in real time in a normal home environment. As a representative example, there are Kinect sensors developed by Microsoft company.
The Kinect sensors are each configured by combining an RGB camera sensor with an infrared camera sensor and recognize a gesture and a motion of each of users. Only an area of a person can be very easily extracted from an image by using a learner extraction method provided by a Kinect sensor. Since low-price hardware of the Kinect sensors is distributed and a disclosed library is supplied, it is possible to develop a number of applicable gesture recognition technologies.
However, the Kinect sensors have a problem where satisfactory performance is not obtained in an outer portion in extracting an area of a person. This is because noise of a depth sensor, a person, and a background actually contact each other, and thus, a depth value is obtained as a similar value. Due to such a problem, it is not easy to separate a person and a background area.
Particularly, a foot region of a person contacts a floor in a background, and due to noise of a sensor and characteristic of a non-uniform floor, it is difficult to accurately detect the foot region of the person by using a related art detection method.
In this context, Korean Patent Publication No. 2013-0043394 “image processing method and apparatus for detecting target, and method and apparatus for user interface” discloses details where a target is extracted by using only depth information of an image obtained from a stereo camera.