In mobile communication, compression coding of digital information such as speech or image is indispensable for effective utilization of a transmission band. Among the compression coding techniques, there is a great expectation for a speech codec (coding/decoding) technique which is widely used for mobile phones, and there is a growing demand for higher sound quality in conventional high efficiency coding with a high compression rate. Since the speech codec technique is publicly used, its standardization is indispensable and because of the enormous impact of intellectual property involved, companies worldwide are actively engaged in research and development thereof. In recent years, ITU-T (International Telecommunication Union-Telecommunication Standardization Sector) or MPEG (Moving Picture Experts Group) have been studying standardization of codec that can encode both speech and music, and more efficient and higher quality speech codec is required.
Speech recognition techniques are being put to practical use, mainly targeted for mobile phones or car navigation systems. Many worldwide venture businesses are being integrated into a small number of companies after being merged repeatedly and their speech recognition techniques are being used for products of a variety of companies.
In speech codec among those techniques, standard codec (ITU-T G729.1, G.718) that encodes input signals of a variety of speech bands is being standardized in which after using a power spectrum in Fourier transform (FFT: Fast Fourier Transform) (hereinafter, described as “FFT”), band power is calculated and a band of an input signal is determined.
Furthermore, what is problematic in speech coding and speech recognition is “environment noise” and techniques for removing this are also being actively studied. Moreover, in addition to noise cancellation, studies on techniques of transforming an input signal into a spectrum through FFT to detect noise or the presence or absence of speech are also being carried forward. With an increase in the processing speed of processors, a method of accurately analyzing spectra through FFT has been adopted in recent years in addition to filter banks which are conventionally used and noise is being analyzed using band power obtained therefrom.
The technique of calculating band power of a spectrum using FFT is used for noise cancellation (also referred to as “noise canceller” or “noise suppressor”), determination of a speech band, detection of speech or speech recognition or the like.
NPL 1 is known as an example where such a technique is used to determine a band of an input signal in speech coding. In this example, an input signal is subjected to FFT, power spectra are obtained, which are then added up for a specified frequency, band power is thereby calculated and a band of the input signal is determined based on the value of the band power.
Furthermore, PTL 1 and PTL 2 are known as examples using such a technique to remove noise. In these examples, an input signal is subjected to FFT, noise is removed on a spectrum, the result is reflected in the spectrum, which is then transformed into an output signal using inverse FFT, and noise is thereby reduced. PTL 1 and PTL 2 are characterized by obtaining a spectrum using FFT, adding up power spectra, then obtaining band power and analyzing noise. This band power is a parameter capable of not only analyzing noise but also analyzing the presence or absence of noise and sound quality. Using FFT in this way can perform an analysis accurately.