Facial analysis of video image data is used for facial animation capture, human activity recognition, and human-computer interaction. Facial analysis typically includes head pose estimation and facial landmark localization. Facial analysis in videos is key for many applications such as facial animation capture, driver assistance systems, and human-computer interaction. Conventional techniques for facial analysis in videos estimate facial properties for individual frames and then refine the estimates using temporal Bayesian filtering. The two inter-related tasks of visual estimation and temporal tracking are isolated and careful manual model designing and parameter tuning for the Bayesian filtering is required. There is a need for addressing these issues and/or other issues associated with the prior art.