After being shown in [Verkruysse2008, Poh2010] that changes invisible to the naked eye can be used to estimate the heart rate from a video of human skin, this topic has attracted a lot of attention in the computer vision community. These subtle changes encompass both color [Wu2012] and motion [Balakrishnan2013] and they are induced by the internal functioning of the heart. Since faces appear frequently in videos and due to recent and significant improvements in face tracking and alignment methods [Asthana2013, Tulyakov2015, Jeni2015, Jourabloo2015, Xiong2013], facial-based remote heart rate estimation has recently become very popular [Li2014, Xu2014, DeHaan2013, Wang2015].
Classical approaches successfully addressed this problem under laboratory-controlled conditions, i.e. imposing constraints on the subject's movements and requiring the absence of facial expressions and mimics [Poh2010, Wu2012, Balakrishnan2013]. Therefore, such methods may not be suitable for real world applications, such as monitoring drivers inside a vehicle or people exercising. Long-time analysis constitutes a further limitation of existing works [Li2014, Poh2010, Poh2011]. Indeed, instead of estimating the instantaneous heart rate, they provide the average HR measurement over a long video sequence. The main disadvantage of using a long analysis window is the inability to capture interesting short-time phenomena, such as a sudden HR increase/decrease due to specific emotions [Valenza2014revealing].
In practice, another problem faced by researchers developing automatic HR measurement approaches, is the lack of publicly available datasets recorded under realistic conditions. A notable exception is the MAHNOB-HCI dataset [Soleymani2012], a multimodal dataset for research on emotion recognition and implicit tagging, which also contains HR annotations. Importantly, an extensive evaluation of existing HR measurement methods on MAHNOB-HCI have been performed by Li et al. [Li2014]. However, the MAHNOB-HCI dataset suffers from some limitations, since the recording conditions are quite controlled: most of the video sequences do not contain spontaneous facial expressions, illumination changes or large target movements [Li2014].
HR Estimation from Face Videos
Cardiac activity measurement is an essential tool to control the subjects' health and is actively used by medical practitioners. Conventional contact methods offer high accuracy of cardiac cycle. However, they require specific sensors to be attached to the human skin, be it a set of electrocardiogram (ECG) leads, a pulse oximeter, or the more recent fitness tracker. To avoid the use of invasive sensors, non-contact remote HR measurement from visual data has been proposed recently by computer vision researchers.
Verkruysse et al. [Verkruysse2008] showed that ambient light and a consumer camera can be used to reveal the cardio-vascular pulse wave and to remotely analyze the vital signs of a person. Poh et al. [Poh2010] proposed to use blind source separation on color changes caused by heart activity to extract the HR signal from a face video. In [Wu2012] an Eulerian magnification method is used to amplify subtle changes in a video stream and to visualize temporal dynamics of the blood flow. Balakrishnan et al. [Balakrishnan2013] showed that subtle head motions are affected by cardiac activity, and these motions can be used to extract HR measurements from a video stream.
However, all these methods failed to address the problems of HR estimation in presence of facial expressions and subject's movements, despite their frequent presence in real-world applications. This limits the use of these approaches to laboratory settings. In [DeHaan2013, Wang2015] a chrominance-based method to relax motion constraints was introduced. However, this approach was tested on a few not-publicly-available sequences, making it hard to compare with. Li et al. [Li2014] proposed an approach based on adaptive filtering to handle illumination and motion issues and they evaluated it on the publicly available MAHNOB-HCI dataset [Soleymani2012]. However, although this work represents a valuable step towards remote HR measurement from visual data, it also shares several major limitations with the previous methods. The output of the method is the average HR, whereas to capture short-term phenomena (e.g. HR variations due to instantaneous emotions) the processing of smaller time intervals is required. A further limitation of [Li2014] is the MAHNOB-HCI dataset itself, since it is collected in a laboratory setting and the subjects are required to wear an invasive EEG measuring device on their head. Additionally, subjects perform neither large movements nor many spontaneous facial expressions.