The heart rate (HR) can be used to infer user physiological parameters associated with diseases like myocardial infarction, diabetic neuropathy, and myocardial dysfunction. Traditional electrocardiography (ECG) and photo-plethysmography (PPG) based HR estimation require human skin contact which is not only user uncomfortable, but also infeasible when multiple user monitoring is required or extreme sensitive conditions is a prime concern as in the case of monitoring: i) neonates; ii) sleeping human; and iii) skin damaged patients. These scenarios require non-invasive mechanism of HR measurement. It can be accomplished by estimating HR from face videos acquired using any camera like web-cams, smartphone camera or surveillance camera in a non-invasive manner.
Usually, existing face videos based HR estimation systems works in the following manner. Facial skin pixels are determined from the face video and referred as region of interest (ROI). Temporal signals depicting the motion or color variations in the frames across time, are estimated from the ROI using Eulerian or Lagrangian approaches. In a Lagrangian approach, temporal signals are determined by explicitly tracking the ROI or discriminating features over time. Such tracking is computationally expensive hence usually temporal signals are estimated using Eulerian approach, i.e., temporal signals are obtained by fixing ROI and analyzing its variations. The Eulerian approach works accurately for small variations. Noise in the temporal signals is filtered for accurate HR estimation. PPG is extracted from the filtered temporal signals and subsequently it is used to estimate the HR using R-R intervals or Fast Fourier Transform (FFT) spectrum. The confidence in the HR estimation known as quality, provides a useful indicator of the efficacy of estimated HR. In several quality measures have been proposed to evaluate the predicted HR in fitness monitoring environment. Existing systems do not use any quality parameter to improve the HR estimation, but rather to understand the effectiveness of the estimated HR.
In addition, along with the color motion variations, the camera also acquire several noises introduced by respiration, expression changes and eye blinking and environmental factors. Further, the variations in the different face parts vary according to the facial structure such as placement of arteries and bones in the face. HR estimation is a challenging problem due to these factors, especially when required in near real-time.