In the fields of video and still photography the use of small, lightweight cameras mounted on a person's body is now well known. Furthermore, systems and methodologies for automatically processing the visual information captured by such cameras is also developing. For example, it is known to automatically determine the subject within an image and to zoom and/or crop the image, or stream of images in the case of video, to maintain the subject substantially with the frame of the image, or to smooth the transition of the subject across the image, regardless of the actual physical movement of the camera. This may occur in real time or as a post processing procedure using recorded image data.
Although such small cameras often include a microphone, or are able to receive an audio input signal from a separate microphone, the audio signal captured tends to be very simple in terms of the captured sound stage. Typically, the audio signal simply reflects the strongest set of sound sources captured by the microphone at any given moment in time. Consequently, it is very difficult to adjust the sound signal to be consistent with the manipulated video signal.
The same problem is faced even if it is desired to capture an audio signal only using a small microphone mounted on a person. In this situation, the audio signal tends to vary markedly as the person moves. This is particularly true if the microphone is mounted on a person's head. Even when concentrating visually on a static object, a person's head may still move sufficiently to interfere with the successful sound capture. Additionally, there may be instances where a user's visual attention is momentarily diverted away from the main source of interest to which it is desirable to maintain the focus of the sound capture system. These motions of a user's head thus cause rapid changes in the sounds detected by the sound capture system.