1. Technical Field
The present disclosure relates to an audio processing system and an audio processing method for adjusting the volume of audio collected in a microphone array device.
2. Description of the Related Art
In a monitoring system installed in a predetermined position (for example, ceiling) of a factory, a store (for example, a retail store or a bank) or a public place (for example, a library), a plurality of camera devices (for example, pan-tilt camera device or omnidirectional camera device) are connected over a network to achieve a wide angle of view of image data (a still image and a moving image; the same applies hereinafter) in a predetermined range of a monitoring target.
An amount of information obtained by only monitoring an image is inevitably limited. Accordingly, there is a high demand for a monitoring system that obtains audio data in a direction in which a camera device images a specific subject by arranging a microphone array device as well as the camera device.
Here, for example, an information processing device shown in Japanese Patent Unexamined Publication No. 2004-180197 is known as related art for performing reproduction focused on audio in a direction of a point of interest of a reproduced image by indicating the point of interest of the reproduced image when data recorded by a plurality of microphones is reproduced.
The information processing device shown in Unexamined Japanese Patent Publication No. 2004-180197 includes a microphone array including a plurality of microphones, a plurality of holding means that hold, for each microphone, input acoustic signals from the individual microphones constituting the microphone array, input means that inputs position information, focusing means that performs acoustic focusing in an acquired positional direction using the held acoustic signals of a plurality of channels, and processing means that processes the acoustic signals in order to apply an acoustic effect to the acoustic signals after focusing. Examples of types of processing of the acoustic signals may include generally used acoustic processing, such as echoes, vibrato, or distortion.
In Unexamined Japanese Patent Application Publication No. 2004-180197, an output (volume) of the audio signal in a target direction is relatively greater than audio signals in directions other than the target direction since the output (volume) of the audio signal in the target direction is subjected to an emphasis process (for example, a directivity forming process; the same applies hereinafter). However, a difference between outputs (for example, volumes; the same applies hereinafter) before and after the emphasis process of the audio signal in the target direction is not considered.
In Unexamined Japanese Patent Publication No. 2004-180197, an emphasis process using a delay and sum scheme is used. When audio before the emphasis process (non-directional audio) and audio after the emphasis process (directional audio) are compared, noise included in the audio signal collected by each microphone has a low correlation for each microphone, and thus, the output of the directional audio is increased by an amount corresponding to an added audio signal for each microphone.
Further, in the emphasis process using a delay and sum scheme, the output of the audio signal after an addition process may be divided by the number of microphones and averaged to be the same as an output of one microphone. In this case, volume of an audio signal in a direction other than the target direction is suppressed through the emphasis process using a delay and sum scheme, and thus, the output of the directional audio is reduced when the output of the non-directional audio and the output of the directional audio are compared.
Therefore, in the emphasis process using a delay and sum scheme, the output of the non-directional audio and the output of the directional audio differ greatly due to the emphasis process regardless of whether the output of the audio signal after an addition process is divided by the number of microphones and averaged
In particular, in the monitoring system described above, in a normal case (for example, when no event as a monitoring target occurs), a monitoring person (for example, a user of the monitoring system) listens to the sounds of an entire monitoring area in a non-directivity state before the emphasis process (that is, before directivity is formed). When abnormal sound is generated or when abnormal behavior on an image of the camera device is confirmed, a use situation in which the person listens to sound in a directivity state for directivity in a specific direction designated by the person (that is, after the directivity is formed) may be considered. When switching occurs between the non-directivity state and the directivity state, trouble occurs in a monitoring service of the monitoring person if there is a great difference in the output of the audio collected by the microphone array device.