Recently, techniques for performing remote photoplethysmography (remote PPG or rPPG) have been developed. These techniques enable a PPG signal to be obtained from a video sequence of image frames captured using an imaging unit (e.g. a camera). It is desirable for a video sequence to be processed and rPPG signals extracted automatically so that subjects can be automatically monitored. However, this requires areas of living skin tissue to be automatically identified in the video sequence.
The task of detecting subjects in a video, as one of the fundamental topics in computer vision, has been extensively studied in the past decades. Given a video sequence containing subjects, the goal is to locate the regions corresponding to the body parts of a subject. Most existing work exploits human appearance features to discriminate between subject and background in a supervised training mechanism. However, a common problem with these methods is that their trained features are not unique to human beings, any feature that is similar to human appearance can be misclassified. Moreover, supervised methods are usually restricted to prior-known samples and tend to fail when unpredictable samples occur, e.g. a face detector trained with frontal faces cannot locate faces viewed from the side, while a skin classifier trained with bright skin subjects fails with dark skin subjects. Other types of methods require an area of skin to be manually selected in the video sequence, with this area being tracked over time to compensate for motion. However, this technique clearly requires manual input, and it is not easy to correctly track the selected area when there is substantial motion.
Based on the development of rPPG techniques, it has been observed that as compared to physical appearance features, the invisible physiological features (e.g. pulse) can better differentiate humans from non-humans in a video sequence. In the natural environment, only the skin tissue of an alive subject exhibits pulsatility, so any object that does not show a pulse-signal can be safely classified into the non-human category. This can prevent the false detection of objects with an appearance similar to humans, as shown, for example, in FIG. 1.
FIG. 1 provides two examples of how a living tissue detection technique should successfully operate. In the left hand image, a human face and an artificial face are present face on to the camera, and only the human face should be identified (despite the artificial face having similar physical appearance features to the human face), as indicated by the dashed box and outline of the area corresponding to living skin tissue. In the right hand image, a human face and an artificial face are present side on to the camera, and only the human face should be identified.
In the paper “Face detection method based on photoplethysmography” by G. Gibert and D. D'Alessandro, and F. Lance, 10th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 449-453, (2013) a hard threshold is set to select segmented local regions (e.g. grids, triangles or voxels) with higher frequency spectrum energy as skin-regions. In the paper “Automatic ROI detection for rPPG measurements” by R. van Luijtelaar, W. Wang, S. Stuijk, and G. de Haan, ACCV 2014, Singapore pre-defined clustering parameters are used to cluster regions sharing similarities as skin regions.
However, these methods still struggle to extract a useful pulse signal when there is significant movement of the subject or skin area.
Therefore it is an object to provide an improved method and apparatus for determining a pulse signal from a video sequence.