In a monitoring system provided in a factory, a store (for example, a retail store or a bank) or a public place (for example, a library), a plurality of monitoring cameras (for example, pan-tilt cameras or omnidirectional cameras) are connected to each other via a network, and thus high image quality and wide angle of view of video data (including a still image and a moving image; this is also the same for the following description) regarding the vicinity of a monitoring target are realized.
In addition, since an information amount which can be obtained in monitoring only using a video is restricted, a monitoring system has recently appeared in which a microphone is also disposed in addition to the monitoring camera, and thus video data and audio data regarding the vicinity of a monitoring target are obtained.
As a related art for obtaining audio data regarding the vicinity of a monitoring target, a sound processing apparatus is known which includes an imaging unit that obtains a captured image and a plurality of microphones (sound collecting unit) that collect audio data, and generates audio data having directivity in a predetermined sound collection direction designated by a sound reproducing apparatus as a client by using the audio data collected by each microphone (for example, refer to Patent Literature 1).
In Patent Literature 1, the sound processing apparatus combines the audio data items collected by the plurality of sound collecting units (microphone) with each other based on a control command for a predetermined sound collection direction which is received in advance from a client (sound reproducing apparatus) connected thereto via a network, generates audio data having directivity in the same direction, and transmits the combined audio data to the client (sound reproducing apparatus).