The present disclosure relates to object recognition.
Today many computer systems and machines rely on person recognition techniques for various different applications. In some example applications, machines and computer systems need to know if there is a human present (or which human is present) at a particular location in order to turn on/off or activate a particular program. As a further example, person recognition is often a fundamental skill in human robot interaction. In general, a robot needs to know where the person is in order to interact with them. However, in some domains the challenges are particularly difficult. In healthcare for instance, a robot that works with people needs to reliably identify the presence of people in the environment, including people that are sitting, lying down, partially occluded, or actively moving given the sensitive nature of that environment. Also, the robot needs to perform these tasks quickly and accurately, and not depend on people changing their behavior to suit the robot so as not to add to what is already likely a difficult time for them.
Many systems use various existing facial, action recognition, and other recognition techniques to recognize people. However, these techniques have limitations that prevent people recognition in less-than-optimal conditions. For instance, some depth image facial recognition techniques that exist for recognizing people depend on the subject's face, or at least eyes and/or nose, being visible in the target image in order to register the image. If the subject's face is not visible, either because the person is rotated away from the camera, or because that person's face lies fully or partially off of the screen, recognition is often not possible using these techniques.
Further, some action recognition techniques that exist can be used to recognize people by the way they walk (e.g., using gait recognition). However, these techniques require capturing the target subject's movements over some distance and/or time, which can cause an undesirable and significant lag in recognition response time. Various head recognition techniques also exist that are capable recognizing people's heads independently of their faces, provided their heads are visible in the image. However, some of these head recognition techniques rely on a camera being positioned above people's heads, which is not always feasible, especially when equipping a mobile platform. Other head recognition techniques use contour recognition. However, these techniques require having both a person's shoulders and head visible in the image, which is not always practicable, particularly when the camera and/or target subject are moving.
Other people recognition systems also exist which use geometric shapes for detecting people in 2D scan data and 3D long-range velodyne data. However, these systems are often unreliable when the detection target is in close range or occluded somehow in the image environment (e.g., by artifacts in the image, objects in the environment, etc.), and are generally unable to uniquely recognize the detection target even when detected.
Some solutions, like gait recognition, can in some cases, recognize people when both the face and part of the head or shoulders is missing. However, these solutions are unreliable because their recognition failure rate is high and they generally have slow response times because they require a sequence of images or video be analyzed.
Additionally, many existing depth-based recognition solutions have strong limitations during mobile perception. For instance, when a sensor and/or subject/object are moved, or moving, around, the relative pose of the person is constantly in flux and changing. This causes problems for face recognition, as well as head recognition because occlusions are bound to occur, and therefore recognition must be possible in their presence.