1. Field of the Invention
The present invention is one that relates to a waveform reproduction apparatus and, in further detail, it relates to a waveform reproduction apparatus that has had the reproduction processing of the audio signal with which the waveform is expressed improved by means of a phase vocoder format.
2. Prior Art
In general, as a technology for the reproduction of an audio signal with which a waveform is expressed, for example, the temporal axis compression and expansion technology (hereafter, referred to as "time stretch technology" as the circumstances warrant) in which the reproduction time of the audio signal that has been recorded, in other words, the waveform, is compressed and expanded on the temporal axis, has come to be utilized in the music production field.
However, although by means of, for example, the making of the rotation speed of the tape in the tape recorder at the time of recording the tape and the rotation speed of the tape at the time of playing back the tape different, it has been possible to compress and expand the reproduction time of the audio signal that has been recorded on the tape on the temporal axis, there have been undesirable changes in the frequencies at the same time.
Because of this, in the time stretch technologies of the past, the audio signals are stored in order temporarily in such things as digital memory, a defined segment is detached and culled out as a segment or a defined segment is repeated and repeated as a segment and it is made so that the reproduction time is compressed and expanded on the temporal axis.
Incidentally, hereafter, the compression of the reproduction time on the temporal axis and the expansion of the reproduction time on the temporal axis will be abbreviated and referred to as "compression" and "expansion" as the circumstances warrant.
However, when an audio signal that is a continuous waveform is culled out or repeated, since the respective connection points become disconnected at the time of culling out or repetition, a new problem has arisen in that noise is generated.
Because of this, a technique has been proposed in which, by means of the cross-fading of the above mentioned connection points, the continuousness of the above mentioned connection points is preserved and the generation of noise is suppressed but it is not possible to completely prevent the fluctuation and rippling of the audio signal and this has not been a fundamental solution. (Incidentally, the meaning of "cross-fading" is a technique in which, at the time that a multiple number of waveforms that are continuous are reproduced, the reproduction is done so that the end section of a specific waveform (hereafter, referred to as the "first waveform") and the beginning section of a specific waveform that follows said waveform (hereafter, referred to as the "second waveform") are overlapped, the volume of the overlapping section of the first waveform is gradually decreased and, together with this, the volume of the overlapping section of the second waveform is gradually increased.) Incidentally, an example of the case in which cross-fading is carried out when a defined segment of an audio signal that has been returned to a waveform is detached, culled out as a segment and compressed is shown in FIG. 1(a). In addition, an example of a case in which cross-fading is carried out when a defined segment of the audio signal is repeated and repeated as a segment and expanded is shown in FIG. 1(b).
However, at the present time, a format known as the phase vocoder is being proposed as a time stretch technology in order to solve each of the above mentioned problem areas.
Here, a phase vocoder is something which is made so that the original audio signal, in other words, the original waveform, is divided into a multiple number of frequency band signals, by means of the analysis of the signals that have been divided in this way, the changes in the frequency and the changes in the amplitude that accompany the passage of time are acquired for each signal and, by means of the synthesis of each signal that has been compressed or expanded, the original signals are obtained as signals that have been compressed or expanded.
Accordingly, with the phase vocoder format, the amount of signal processing is enlarged but since there is no culling out or repetition of a defined segment of the audio signal, in other words, the waveform, as in the cross-fade format that was discussed previously, despite the compression and expansion of the reproduction time, there are no changes in the frequency and, moreover, it is possible to carry out the smooth compression and expansion of the reproduction time without noise or fluctuations.
A block structural diagram of one example of a publicly known phase vocoder is shown in FIG. 2. In addition, a block structural diagram is shown in FIG. 3 of a detailed illustration of the analysis section (band k analysis section) 400 of the band k that is in the phase vocoder that is shown in FIG. 2 (in the example that is shown in FIG. 2, k is an integer from 0 to 99). An explanation of the phase vocoder will be given below while referring to FIG. 2 and FIG. 3.
The phase vocoder is something in which the audio signal, in other words, the waveform, is divided into a multiple number of frequency bands that roughly have the bandwidth of the fundamental frequency (in the phase vocoder that is shown in FIG. 2, as is shown in FIG. 4, the frequency bands are divided into the 100 bands of band 0 to 99). In the analysis sections for each of the frequency bands that have been divided (in the phase vocoder that is shown in FIG. 2, these are the band 0 analysis section through the band 99 analysis section and, as mentioned above, the details of the band k analysis section are shown in FIG. 3), the audio signals of each of the frequency bands that have been divided are multiplied by the complex frequencies that are the center of the respective frequency bands and analyzed and expanded into the amplitude values and the momentary frequencies.
Here, w(n) in FIG. 3 is the impulse response of the analysis filter and the action of the band k analysis section is equal to the well known Fourier transform of the short segment that is detached in the w(n) window.
Then, the amplitude values and the instantaneous frequencies that have been obtained by the analysis of each of the frequency bands that have been divided are stored in the storage section.
The combination of each of the frequency bands that have been stored in the storage section in this manner with the audio signals that have been divided is carried out in the combining section, the sine waves of said center frequencies of each of the frequency bands that have been analyzed are modulated by the amplitude values and the instantaneous frequencies that have been analyzed, the audio signals of each said frequency band are generated and if the audio signals of each of the frequency bands that have been generated are mixed, the original audio signal is restored.
Here, in the case in which the reproduction time of the audio signal is compressed and expanded, time and frequency conversion processing with which the interpolation value of the amplitude value and the interpolation value of the instantaneous frequency are sought is carried out in the conversion section.
A block structural diagram of the band k conversion section for the execution of the time and frequency conversion processing related to band k is shown in FIG. 5(a). An explanation regarding the processing in the case where the reproduction time of the audio signal is compressed and expanded will be given while referring to FIG. 5(a).
First, in the case in which the reproduction time of the audio signal is expanded, the amplitude values at each of the sample points in the conversion section are interpolated, the amplitude value envelope is enlarged based on the temporal expansion data and, in addition, the interpolation values of the sample points are sought for the instantaneous frequency also (refer to FIG. 5(b)). Then, from the amplitude values and the instantaneous frequencies that have been obtained by means of the interpolation in this way, in the same manner as mentioned above, each of the audio signals of the frequency bands that have been divided is derived in the combination section and mixing is done.
On the other hand, in the case in which the reproduction time of the audio signal is compressed, the amplitude values and the instantaneous frequencies are culled out by interpolation and the envelope is compressed (refer to FIG. 5(c)). Then, from the amplitude values and the instantaneous frequencies that have been obtained by means of the interpolation in this way, in the same manner as mentioned above, each of the audio signals of the frequency bands that have been divided is derived in the combination section and mixing is done.
Incidentally, in the case where the pitch of the audio signal is modulated, the harmony between the center frequencies of each of the frequency bands that have been divided is multiplied by the proportion of the change and the above mentioned interpolation operations may be executed.
In addition, since the processing that has been described above is executed by means of publicly known techniques, a flow chart as well a detailed explanation will be omitted.
However, in the above mentioned phase vocoder format, since the compression and expansion of the audio signal, in other words, the waveform, is achieved simply by the expansion or the compression of the envelopes that denote the respective time changes of the amplitude values which are the amplitude data and the instantaneous frequencies which are the frequency data, there have been problems in that it is not possible to carry out the compression and expansion of an audio signal that has an abundance of changes.
In addition, in the phase vocoder, there often are cases where the original tone (the original audio signal) is faithfully reproduced but, in those cases, together with compression and expansion on the temporal axis not being carried out, the reproduction is done without making any changes in the pitch (hereafter, "together with compression and expansion on the temporal axis not being carried out, the reproduction is done without making any changes in the pitch" is referred to a "one-to-one reproduction" as the circumstances warrant).
However, in the phase vocoder fornat that is mentioned above, there is no phase data and, in addition, since no means has been established in which to set the phase value in the cosine oscillator at the time of the start of the reproduction, when the reproduction is carried out, a suitable arbitrary phase value is set and the reproduction is begun.
Because of this, even if a one-to-one reproduction is carried out, the phase value is, in general, different from that of the original tone and there has been the problem that a sound is reproduced that differs from the original tone. In other words, even in those cases where a one-to-one reproduction is carried out, there has been a problem that it is not possible to faithfully reproduce the original tone.
3. Problem of Prior Art to be Addressed
The present invention is one that was done taking into account the problem areas that are inherent in the technology of the past such as those mentioned above. In order to achieve that objective, a waveform reproduction apparatus is presented in which the smooth compression or expansion of an audio signal is possible without a particular segment of the audio signal, in other words, the waveform, being directly culled out or repeated by means of the utilization of a phase vocoder format and, together with this, it is possible to carry out the compression and expansion of an audio signal that has an abundance of changes.