Systems and methods for detecting reactions of animate objects, such as humans, animals and certain types of robots, have been developed. The systems frequently employ wearable still and/or video cameras having a computer and sensing system. Wearable still and/or video cameras are known which are head mounted, and can be incorporated in wearable headgear such as a hat, or a pair of sunglasses. Autonomous wearable cameras and other detectors which are able to capture moments of interest by inferring situations of interest to their wearers have the potential to revolutionize the way in which image capture (including visual, sonic and olfactory images) is conducted. However, there are major technological challenges to be faced in how to detect such situations of interest.
Some known wearable image capture devices, in which wearable image capture devices are activated by sensors, include the following:
“StartleCam; A cybernetic Wearable Camera” MIT media laboratory perceptual computing section technical report number 468, in Proceedings of the International Symposium on Wearable Computers, pages 42–49, 1998 discloses a wearable video camera having a computer and sensing system which enables the camera to be controlled via both conscious and pre-conscious events involving a human wearer of the camera.
Wearable cameras activated by a sensor which measures brain waves are disclosed in prior art document “Summarizing Wearable Video” IEEE International Conference on Image Processing, III: 398–401, Thessalonika, Greece, 2001.
Prior art systems for detecting the attention of a person wearing a wearable camera are known from “LAFCam—Leveraging Affective Feedback Camcorder”, A. Lockerd, F. Mueller, ACM CHI, 2002. In this disclosure, human body language, including laughing, skin conductivity and facial expressions, are used to record affective data from a camera operator, which is then used to determine which sequences of video will be interesting to a camera operator at a later time.
Prior art methods for estimating the attention of an animate object are also known, for example: “Estimating focus of attention based on gazed and sound”, R Stiefelhagen, J. Yang, A. Waibel, Pproceedings of the Workshop on Perceptive User Interfaces, 2001
Other prior disclosures relate to work on interpretation of hand gestures by humans, for example see “Visual Interpretation of Hand Gestures for Human—Computer Interaction: A Review”. V. Pavlovic, R. Sharma, T Huang, Department of electrical and computer engineering and the Beckman institute, University of Illinois at Urbana, USA, IEEE Transactions PAMI, vol. 19, no. 7, pp. 677–695, July 1997.
Another body of prior art research looks at understanding animate object behavior through vision and by analyzing audio is disclosed in “Looking at People: Sensing for Ubiquitous and Wearable Computing”, A. Pentland. IEEE Transactions on Pattern Analysis and Machine Intelligence, Los Alamitos, Calif., January 2000 pp. 107–118.
Vertegaal et al. have developed conversational agents which are able to detect what an animate object is looking at and act accordingly. Such conversational agents are disclosed in the prior art document “Why Conversational Agents Should Catch the Eye”, R Vertegaal, R. Slagter, G.van der Veer, A. Nijholt, Summary of ACM CHI Conference on Human Factors in Computing, The Hague 2000.
Prior work also reports experiments that validate and extend the classic model of gaze during dyadic social interaction by Kenden, as exemplified in “Some function of gaze direction in social interaction”, A. Kenden, Acta Physchologica, 32;1–25, 1967. Using eye trackers and a keyboard which participants used to specify to whom they were paying attention, Kenden et al managed to analyze the relationship between gaze and attention, and found that the probability that the subject was looking at a speaker or a listener in the case where the subject was speaking, was between 77% and 88%.
Computational approaches have been made to implement social attention interpretation models in a humanoid robot such as that disclosed by Barron-Cohen. Such approaches are disclosed in the COG project at the Massachusetts Institute of technology “The COG project” V. Adams, C. Breazeal, R. Brookes, B. Scassellati, IEEE Intelligent systems, 15(4): 25–31, 2000. Work has also been carried out in the prior art on gaze following and its extension to deictic gestures, to distinguish between animate, inanimate and self-motion. Investigation of pointing gestures and deictic behaviors in general for learning has also been carried out and has provided evidence of the relevance of interpreting social interaction clues in early child development. “Deictic codes for the embodiment of cognition”, D. Ballard, M M Hayhoe, P K Pook, R P N Rayo, Behavioral Brain Science, 20: 723–742, 1997.
Known work in the field of attention detection focuses on situations from a first person perspective, that is, from the perspective of a person or other animate object whose attention is being captured, or alternatively from the perspective of an external observer, observing a person who may be wearing an image capture device, that is, from an ‘observer perspective’.