1. Field of the Invention
The present invention relates to an audio processing apparatus, an audio processing method, and an imaging apparatus.
2. Description of the Related Art
Conventionally, some digital cameras are known to record sounds, and have a function of moving image capture which accompanies audio signal recording, in addition to still image capture. In such a camera, when a driver of, for example, a focus lens or a diaphragm mechanism is operated during moving image capture, a driving sound generated by the driver mixes in with an audio signal as noise upon recording.
Japanese Patent Laid-Open No. 2008-077707 relates to a technique for removing driving noise in a storage device of a video camera. This patent literature discloses processing of predicting a sound which contains no noise in a driving noise mixture interval from audio signals in intervals preceding and succeeding the driving noise mixture interval, and replacing the detected data with the predicted data. This processing uses a technique of interpolating an audio signal in a driving noise mixture interval by predicting succeeding audio signals from immediately preceding audio signals, based on the periodicity of the audio signals.
However, in the conventional technique, when the periodicity of the audio signals in the intervals preceding and succeeding the noise mixture interval is low, the audio prediction accuracy is poor.
FIG. 18A illustrates an example of an audio signal waveform generated when one adult woman utters the Japanese syllable “a,” and FIG. 18B illustrates an example of an audio signal waveform generated when driving noise has mixed in with the signal shown in FIG. 18A. Since the audio signal waveform shown in FIG. 18A has a very high periodicity, it can easily be predicted and interpolated from audio signals in intervals preceding and succeeding a noise mixture interval even if noise has mixed in with it, as in the case of FIG. 18B.
On the other hand, FIG. 19A illustrates an example of an audio signal waveform generated when the same adult woman utters the Japanese syllable “ka,” and FIG. 19B illustrates an example of an audio signal waveform generated when driving noise has mixed in with the audio signal shown in FIG. 19A in an interval immediately succeeding the consonant interval of this audio signal upon lens driving. The consonant interval immediately preceding a noise mixture interval is not repeated more than once in the noise mixture interval and the interval immediately preceding the noise mixture interval, and therefore has a very low periodicity. At this time, when prediction processing is performed in the same way as in the conventional technique, an audio signal representing the consonant portion, or a signal representing a sound totally different from the sound actually uttered by the woman in the noise mixture interval may be interpolated for the noise mixture interval.
Also, when a noise mixture interval is determined in accordance with the drive command timing of an imaging lens driver, and prediction processing is performed for this noise mixture interval, the following problem may be posed.
FIG. 20 illustrates an example of an audio signal waveform when friction noise is generated as the operator touched an imaging apparatus immediately before generation of driving noise upon driving of the imaging lens driver, which is to undergo noise removal processing. Sounds other than the driving noise and friction noise are identical to those of the object sound shown in FIG. 18A. As shown in FIG. 20, when another noise is generated as, for example, the operator scratches the apparatus surface immediately before driving of the imaging lens driver, friction noise generated immediately before the noise mixture interval is used in predicting an audio signal in the noise mixture interval in the conventional technique. Hence, a discordant sound is produced after noise removal processing.