1. Field of the Invention
The present invention relates to a sound signal processing circuit for calculating the mask level of a sound sample quantized in a sub-band encoding circuit which encodes a sound signal by dividing it for every frequency band.
2. Description of the Related Art
FIG. 7 shows the general schematic configuration of a conventional sound signal processing circuit used for ISO/IEC11172-3 (hereinafter referred to as MPEG/Audio). Each section will be described below. For example, when 1024 input sound samples 61 are entered as sample, an FFT circuit 62 performs fast Fourier transform to effect sample output of 512 power spectrum samples.
From the entered power spectrum samples, a classifying circuit 63 for classifying into a pure sound and a noise extracts maximum power spectrum samples (those larger than power spectrum samples of adjacent frequencies) as pure sound component and others as noise component, thereby classifying the entered power spectrum samples into the pure sound component and the noise component.
A sub-sampling circuit 64 integrates a prescribed number of high power spectra into one power spectrum sample by utilizing a fact that a man's sense of hearing is poor in discriminating frequencies as they are higher. The number of power spectra to be integrated is variable depending on whether the applicable power spectra belong to the pure sound component or the noise component.
A mask calculating circuit 65 determines a mask level from the sub-sampled power spectrum samples as the pure sound component and those as the noise component. The mask level means a sound level that is a minimum level a man can hear and is variable gradually according to the distribution of frequencies of sounds which are being heard by a man at that time.
Conventional calculation of a mask level will be described with reference to FIG. 8. With the sense of hearing a man has, when there is a sound, namely a power spectrum 71, sounds having its adjacent frequencies are hard to hear. In other words, a mask 72 is formed on the frequencies adjacent to the power spectrum.
Conventionally, the mask 72 has its contour (the mask's height and the inclination of its straight line) variable according to whether the power spectrum is a pure sound component or a noise component or according to the magnitude of the power spectrum.
This mask is calculated on every sub-sampled power spectrum, and the calculated results are summed up. When there are (n) power spectra in calculating the mask, calculation in the order of the first power of (n) is required to determine a mask for one power spectrum. And, this calculation is repeated on all of the (n) power spectra, and to sum up the results, the number of calculation in the order of the second power of (n) is required as a whole.
Since a real-time MPEG/Audio encoder is required to make such processing within a limited time, such a large change in the volume of processing depending on the entered sound is very inconvenient.
Since the number (n) of power spectra is arbitrary, an arithmetic unit which is fast enough to be able to deal with substantially large (n) must be used, but it becomes very large in scale because the volume of operation increases in proportion to the second power of (n). When an entered sound has an unexpectedly large value (n), processing does not catch up and fails, resulting in producing noises.
As described above, conventionally, since the mask contour was variable depending on whether the power spectrum was a pure sound component or a noise component and also depending on the magnitude of the power spectrum, there was a disadvantage that a very large volume of calculation was required to determine a mask level. Thus, the arithmetic unit used became very large in scale.
Besides, the volume of arithmetic operation was largely affected by a sound to be entered because the input sound was classified into the pure sound component and the noise component. Thus, there was a disadvantage that the processing did not catch up to the entry of a voice and fails, resulting in producing noises.