The present application claims priorities under 35 U.S.C. xc2xa7119 to Japanese Patent Application No. 2001-001409, filed on Jan. 9, 2001 and entitled xe2x80x9cMethod for extracting formants of a musical tone, recording medium and apparatus for extracting formants of a musical tonexe2x80x9d, Japanese Patent Application No. 2001-375423, filed on Dec. 10, 2001 and entitled xe2x80x9cMethod for extracting formants of a musical tone, recording medium and apparatus for extracting formants of a musical tonexe2x80x9d, and Japanese Patent Application No. 2001-392305, filed on Dec. 25, 2001 and entitled xe2x80x9cMethod for extracting formants of a musical tone, recording medium and apparatus for extracting formants of a musical tonexe2x80x9d. The contents of these applications are incorporated herein by reference in their entirety.
1. Field of the Invention
The present invention relates to a method for extracting formants of waveform data of a sampled musical tone, a recording medium and an apparatus for extracting formants of a musical tone.
2. Discussion of Background
Frequency characteristics are ones that show the characteristics of a musical tone waveform. Usually, line spectra are found by FFT (Fast Fourier Transformation) and are evaluated as the frequency characteristics. However, it is difficult to grasp the entire characteristics since too much detailed information is contained.
When the line spectra found by the FFT are smoothed to obtain formants, and when the formants are evaluated as the frequency characteristics of the musical tone waveform, it is easier to grasp the entire characteristics, and, e.g., treatment of the waveform becomes easier.
As a method for obtaining formants, it has been proposed to find formants by performing cepstral analysis.
The cepstrum is one that is obtained by performing FFT on an input signal, taking logarithms of the amplitude spectra of the transformed input signal and then performing Inverse FFT on the logarithms. The dimension is called quefrency. The quefrency has the same dimension as time. The fine structure of spectra appears at a higher quefrency, and the spectral envelope (formants) appears at a lower quefrency.
The cepstral analysis means that only parts having a lower quefrency at the dimension of quefrency are extracted (Hereinbelow, the maximum quefrency on extraction will be called the coefficient of the cepstral analysis.), and that FFT is performed on the extracted parts to extract formants of an input signal.
FIG. 12 shows a case wherein the coefficient of the cepstral analysis is 80, and FIG. 13 shows a case wherein the coefficient of the cepstral analysis is 40.
However, even after the cepstral analysis, minute fluctuations due to harmonic components remain, and the positions and the levels of peaks have changed in comparison with the original data as in the case shown in FIG. 12. When the coefficient is decreased to reduce fluctuations due to harmonic components, wide fluctuations are also lost, failing to show the characteristics of the original data, as shown in FIG. 13.
In the cepstral analysis, peaks are lowered under the influence of valleys between line spectra, and not only minute fluctuations at harmonic component levels but also wide fluctuations in the entirety are lost.
In the case of a normal musical tone, there is occurred a case wherein although the fundamental tone level is the greatest and harmonic component levels become smaller as the frequency increase, levels in the vicinity of the fundamental tone level (in particular, frequency components not higher than the fundamental tone) become smaller under the influence of valleys that are not higher than the fundamental tone.
The present invention is provided in consideration of these problems and proposes a method for extracting formants that reflect the entire characteristics of the waveform data of an original musical tone with fidelity, a recording medium with a program capable of performing the extracting method saved thereto, and an apparatus for extracting formants of a musical stone.
From the viewpoints, the method for extracting formants of a musical tone according to a first aspect of the present invention is basically characterized in that the method carries out the steps of:
finding power line spectra of a waveform to be processed;
performing level interpolation control on the power line spectra at every unit of a certain frequency, which is up to and including half a sampling frequency;
performing Fast Fourier Transformation or Inverse Fast Fourier Transformation on ones obtained by connecting peaks of harmonic components by the level interpolation control;
performing level setting with a specified coefficient to smooth a spectral envelope to be obtained as formants later on; and
obtaining the spectral envelope by performing Inverse Fast Fourier Transformation or Fast Fourier Transformation on values that are found by performing the level setting with the specified coefficient.
The arrangement according to the first aspect can be free from minute fluctuations due to harmonic components, can prevent peaks from lowering by eliminating valleys between line spectra with the level interpolation control before the cepstral analysis, and can prevent levels in the vicinity of the fundamental tone (in particular, frequency components not higher than the fundamental tone) from becoming smaller under the influence of valleys that are not higher than the fundamental tone, thereby obtaining formants that represent the characteristics of the original data in terms of all respects, such as the positions and the levels of the peaks.
The reason why the level interpolation control is limited to the frequencies that are up to and including half a sampling frequency is that the frequency equal to half a sampling frequency is an upper limit according to the sampling theorem. The certain frequency for the level interpolation control may be arbitrarily set as long as the cycle is not beyond the upper limit.
The arrangement according to a second aspect of the present invention is directed to one of the ways of carrying out the level interpolation control in the level interpolation control step, which specifically comprises the steps of finding a frequency F1 and a level L1 of a spectrum having a maximum level before and after a fundamental tone of the waveform; bringing all levels of the power line spectra at 0 up to the frequency F1 into L1; repeating processing wherein, at every frequency that is an integral multiply of a frequency of the fundamental tone and is up to and including half the sampling frequency, a frequency Fn and a level Ln of a spectrum having a maximum level are found before and after the respective integral multiple frequencies, and the levels from the frequency having subjected to the level control at the previous stage to the frequency Fn are controlled to have values interpolated from a level Lnxe2x88x921 to the level Ln; finding a frequency FN and a level LN of a last harmonic component; and bringing all levels from the frequency FN up to the frequency of the last harmonic component set for the level interpolation control to LN, thereby performing level interpolation control to connect peaks of harmonic components with the result that valleys not higher than the fundamental tone or valleys between line spectra are eliminated.
The arrangement according to a third aspect of the present invention is directed to another example of the ways of carrying out the level interpolation control in the level interpolation control step, which specifically comprises the steps of finding a frequency F1 and a level L1 of a spectrum having a maximum level before and after a fundamental tone of the waveform; bringing all levels of the power line spectra at 0 up to the frequency F1 into L1; repeating processing wherein a frequency Fn and a level Ln of a spectrum having a maximum level are found before and after respective frequencies that are obtained by adding the frequency of the fundamental tone to a frequency Fnxe2x88x921 found at the previous stage and are up to and including half the sampling frequency, and the levels from the frequency Fnxe2x88x921 to the frequency Fn found at the previous stage are controlled to have values interpolated from a level Lnxe2x88x921 to the level Ln; finding a frequency FN and a level LN of a last harmonic component; and bringing all levels from the frequency FN up to the frequency of the last harmonic component set for the level interpolation control to LN, thereby performing the level interpolation control to connect peaks of harmonic components with the result that valleys not higher than the fundamental tone or valleys between line spectra are eliminated.
The difference to the arrangement according to the second aspect is that the level interpolation control of the power line spectra, which is carried out at every unit of the certain frequency, is carried out at every frequency that is obtained by adding the frequency of the fundamental tone to the frequency subjected to the level interpolation control at the previous stage, not at every frequency that is integral multiplies of the frequency of the fundamental tone. Although the way according to the second aspect does not always get the shape of the formants at high frequencies in alignment with the locations of the peaks of the harmonic components and cannot avoid misalignment in some cases, the way according to the third aspect can get the shape of the formants in alignment with the points of the peaks of the harmonic components and can avoid misalignment to represent the characteristics of the original data with more fidelity in terms of the locations and the levels of the peaks since the level interpolation control is carried out with a subsequent peak point being found while adding the frequency of a fundamental tone to the peak point at the previous stage.
In other words, when the peaks of data wherein the distance between harmonic components expands as the degree of harmonic components increases as in a piano are extracted at every certain section by a computer, the extracting section is divided at an intermediate point between peaks of harmonic components as shown in the vicinity of 8 KHz in FIG. 14 to create a problem in that it becomes impossible to extract the peak of a harmonic component successfully. Even in such a case, the level interpolation control by the arrangement according to the third aspect can get the shape of the formants in alignment with the points of the peaks of the harmonic components and can avoid misalignment since the level interpolation control is carried out with a subsequent peak point being found while adding the frequency of a fundamental tone to the peak point at the previous stage.
The arrangements according to fourth to sixth aspects of the present invention are directed to a recording medium, which saves a computer-executable program to cause a computer to execute the steps recited in each of the arrangements according to the first to third aspects. In other words, as the arrangement for solving the problems stated earlier, the present invention discloses a recording medium, which saves a program executable by a computer. The arrangements according to the fourth to six aspects may be provided not only as a recording medium but also as a program for attaining similar functions. In this case, the computer may be an exclusive machine directed to specific processing, besides a general-purpose computer with a central processing unit included therein, and there is no particular limitation on the computer as long as it includes a central processing unit.
When the program for causing a computer to execute the processing steps stated earlier is read out from the recording medium by the computer, processing steps similar to the processing steps recited in the first to third aspects are executed.
Among them, the arrangement of the fourth aspect corresponds to the arrangement according to the first aspect and is directed to a computer-readable recording medium, which specifically has a program saved thereto, the program causing a computer to perform the steps;
finding power line spectra of a waveform to be processed;
performing level interpolation control on the power line spectra at every unit of a certain frequency, which is up to and including half a sampling frequency;
performing Fast Fourier Transformation or Inverse Fast Fourier Transformation on the power spectra subjected to the level interpolation control;
performing level setting with a specified coefficient; and
performing Inverse Fast Fourier Transformation or Fast Fourier Transformation on values found by performing the level setting with the specified coefficient.
The arrangement according to the fifth aspect corresponds to the arrangement according to the second aspect. Specifically, the arrangement is characterized in that the step of performing the level interpolation control comprise finding a frequency F1 and a level L1 of a spectrum having a maximum level before and after a fundamental tone of the waveform; bringing all levels of the power line spectra at 0 up to the frequency F1 into L1; repeating processing wherein, at every frequency that is an integral multiply of a frequency of the fundamental tone and is up to and including half a sampling frequency at a maximum, a frequency Fn and a level Ln of a spectrum having a maximum level are found before and after the respective integral multiple frequencies, and the levels from the frequency having subjected to the level control at the previous stage to the frequency Fn are controlled to have values interpolated from a level Lnxe2x88x921 to the level Ln; finding a frequency FN and a level LN of a last harmonic component; and performing level interpolation control to connect peaks of harmonic components by bringing all levels from the frequency FN up to the frequency of the last harmonic component set for the level interpolation control to LN.
The arrangement according to the sixth aspect corresponds to the arrangement according to the third aspect. Specifically, the arrangement is characterized in that the step of performing the level interpolation control comprises finding a frequency F1 and a level L1 of a spectrum having a maximum level before and after a fundamental tone of the waveform; bringing all levels of the power line spectra at 0 up to the frequency F1 into L1; repeating processing wherein a frequency Fn and a level Ln of a spectrum having a maximum level are found before and after respective frequencies that are obtained by adding the frequency of the fundamental tone to a frequency Fnxe2x88x921 found at the previous stage and are up to and including half a sampling frequency, and the levels from the frequency Fnxe2x88x921 to the frequency Fn found at the previous stage are controlled to have values interpolated from a level Lnxe2x88x921 to the level Ln; finding a frequency FN and a level LN of a last harmonic component; and performing level interpolation control to connect peaks of harmonic components by bringing all levels from the frequency FN up to the frequency of the last harmonic component set for the level interpolation control to LN.
By providing a recording medium with any one of the arrangements stated earlier, the arrangement for executing the processing steps recited in any one of the first to third aspects can be distributed as a software product. By utilizing the software in an existing hardware resource, the arrangements according to the present invention can be easily implemented as a new application in the existing hardware resource. It is needless to say that besides that sort of recording medium, an internal storage, such as a RAM and a ROM, and an external storage, such as a hard disk, are included as the recording medium covered by the present invention as long as the program stated earlier is saved thereto.
One of the processing steps in any one of the fourth to sixth aspects may be implemented by a function incorporated in a computer (which may be a function incorporated as a part of a hardware in a computer, or a function realized by the operating system incorporated in a computer, another application system or the like), and the program saved to the recording medium may include a command to call or link to the function to be performed by the computer.
This is because a substantially similar arrangement can be provided as long as a part of the processing steps recited in the fourth to sixth aspects is taken over by a part of the functions attained by, e.g., an operating system for attaining the functions and as long as the part of the functions of the operating system is configured to be called or linked, though neither program nor module for realizing that function is directly saved to the recording medium.
Additionally, a seventh aspect of the present invention is directed to the structure of an apparatus with the arrangement stated earlier, which is characterized to specifically comprise
a power line spectrum extractor, which extracts power line spectra of waveform to be processed;
a level interpolation controller, which performs level interpolation control on the power line spectra at every unit of a certain frequency, which is up to and including half a sampling frequency; and
a cepstrum analyzer, which performs Fast Fourier Transformation or Inverse Fast Fourier Transformation on the power spectra subjected to the level interpolation control and performs Inverse Fast Fourier Transformation or Fast Fourier Transformation on values found by performing level setting with a specified coefficient.
As explained, the method for extracting formants of a musical tone, the recording medium and the apparatus for extracting formants of a musical tone according to the first to seventh aspects of the present invention can offer advantages that it becomes possible to obtain formant data wherein components in the vicinity of a fundamental tone (in particular frequency components not higher than the fundamental tone) can be prevented from getting smaller, the levels of harmonic components inherent in the waveform are reflected with fidelity, a rough shape of the frequency characteristics is represented, and the characteristics of the original power line spectra are clearly represented.