Human-computer interaction uses many modalities including language (typing, voice recognition, on-screen text display, speech synthesis, and the like) and vision (still and video cameras, graphic displays, and the like). Face detection, recognition, expressions, and so forth forms an important part of human-to-human communication and thereby is important for human-machine interaction as well. There are many methods and applications available for detection, tracking, recognition of face(s) in still images and videos, including emotion detection, gender classification, lip reading, eye/gaze tracking, etc.
Traditional systems for face or general object detection and tracking use computing-intensive algorithms requiring high-speed, power-hungry microprocessors, high-consumption of data path bandwidth for moving large numbers of operands, and heavy usage of memory. These systems typically use a still/video camera capturing and delivering intensity-based images to a general-purpose microprocessor, which analyzes the images, displays the images gleaned on a screen, or otherwise acts based on the scene-information thus retrieved. Given the high power consumption of the existing systems, use-cases and applications for face or general object detection have become fairly limited. For example, the camera and the processor sub-system cannot be kept turned on for most of the time for portable, battery-powered devices.