Tracking orientation of a person's head, especially the face, is helpful in many applications. It can inform an application about the direction in which a person is looking or speaking, and it is a helpful in locating facial features to recognize facial expressions or perform face recognition. It can also allow a system to know where a user is probably attending (what they are paying attention to). This is important, for example, for an agent to know whether a user is looking at a projected display, some other object, or another person. At other times, it may be valuable for an agent to be able to determine the source of a voice.
However, sometimes camera and/or microphone coverage of a space is limited, and if the person does not look at a camera directly, exposing the face, systems have difficulty determining which direction a person is looking when relying on face detection/face recognition algorithms. Also, sometimes due to camera position, it may be desirable to track which direction a person is facing without visibility to the face. This could be done with training to a person's head shape, but a reference value (“ground truth”) about the orientation of a person's head is difficult to establish without substantial cooperation (training sessions) from people. The same is true about ground truth about which direction a user is facing for voice-source determination.
In modern systems based on machine learning technologies to train a recognition system, usually a significant amount of training data is needed to obtain acceptable performance. Lately, with deep learning approaches, this need has increased even further. For example, considering vision techniques, training under different lighting conditions and occlusions is desirable, hence training data under realistic conditions (vs. lab data or obtained from small training sessions) is needed to obtain state-of-the-art performance. Furthermore, it is recommended to continue to collect training data even when the system is performing recognition, such as when using personalized or adaptive models.
Although the following Detailed Description will proceed with reference being made to illustrative embodiments, many alternatives, modifications and variations thereof will be apparent to those skilled in the art.