1. Field of the Invention
The present invention relates to a method for shifting a pitch of acoustic signals, which are expressed in terms of a series of digital signals, to another optional pitch using a compaction and/or expansion process of the time axis and a cross-fade process, and an apparatus for performing the shift.
2. Description of the Related Art
Acoustic pitch shifting has been employed in, for example, KARAOKE, etc. The acoustic pitch shifting apparatus shifts an acoustic pitch to an easy-to-sing pitch without changing the speed of a melody.
That is, a pitch shifting apparatus is equipment, like a key controller used in KARAOKE, which shifts the pitch of acoustic signals, that is, the frequency thereof, to a multiple of the original frequency by a constant.
Until now, various types of pitch shifting methods have been proposed. However, the present invention relates to a technology for pitch-shifting a series of digital signals by a compaction and/or expansion process and a cross-fade process in terms of a time axis.
Herein, the compaction and/or expansion of the time axis is a process that compacts and/or expands the time axis of an original signal to generate a series of signals, which are multiples of the original frequency related by a constant.
Further, the cross-fade process overlaps faded-in signals which are partial signals picked up from the original signals and faded-out signals that are partial signals differing from the above-described partial signals in terms of a time axis.
(Prior Art 1: Pitch Shifting Technology without Compensating for a Phase Difference)
First, a description is given of the principle of pitch shift with reference to FIG. 9.
Compaction and/or expansion of the time axis In FIG. 9, the abscissa shows time while the ordinate shows amplitude of signals.
FIG. 9(a) shows the waveform of the original signals.
Herein, through compaction and/or expansion of the time axis, it is possible to shift the pitch (frequency).
For example, if the original signals shown in FIG. 9(a) are compacted in terms of a time axis, the original signals are shifted to higher frequencies as shown in FIG. 9(b).
The time required for reproduction of signals (FIG. 9(b)) after the compaction and/or expansion of the time axis is shorter than the time required for reproduction of the original signals (FIG. 9(a)).
On the other hand, if the original signals shown in FIG. 9(a) are expanded in terms of a time axis, the original signals are shifted to lower frequencies as shown in FIG. 9(c).
The time required for reproduction of signals (FIG. 9(c)) after the expansion of the time axis is longer than the time for reproduction of the original signals (FIG. 9(a)).
As described above, if compaction and/or expansion of the time axis is carried out, the time required for reproduction differs from the time required for reproduction of the original signals. This causes the problem that processes of changeover of time windows are made non-continuous thereby causing noise to occur.
Therefore, in the prior art technology 1, it was devised that a cross-fade process is added to the compaction and/or expansion of the time axis so that the time of reproduction remains the same as the original signal.
Combinations of the Compaction and/or Expansion of the Time Axis and the Cross-Fade Process
FIG. 10(a) shows an example of compacting the time axis. FIG. 10(b) shows an example of expanding the time axis.
In FIG. 10(a), the upper section (1) thereof shows original signals expressed as a series of digital signals. The middle section (2) of FIG. 10(a) shows a compacting process of the time axis. The lower sections (3-1) and (3-2) show first and second examples of the cross-fade process, wherein the slashes (diagonal lines) in the lower sections (3-1) and (3-2) show the cross-faded points. In the first example, the cross-fade is made slightly longer, and in the second example, the cross-fade is made slightly shorter.
Components that are located below the diagonal lines are faded-in, and those that are located above the diagonal lines are faded-out.
A further detailed description is given of the respective processes with reference to FIG. 10(a).
Herein, a process is carried out with partial signals corresponding to a time (T1+T1) from the original signals. The time (T1+T1) is very short such as, for example, 0.1 seconds.
Also, it is assumed that the ratio K1 of the compaction of the time axis is a figure that is greater than 1, and T2=K1*T1 is established.
Partial signals corresponding to the time (T2+T2) are picked up from the original time. The front half A1 of these signals is assigned to the fade-in side, and the rear half B2 thereof is assigned to the fade-out side.
The components A1 and B2 are subjected to the compaction in the time axis using the compression ratio K1, so that, after compaction, the front half component A1H and the rear half component B2H are obtained.
As a matter of course, the time of reproduction of these components A1H and B2H is time T1 in either of them.
Subsequently, partial signals corresponding to the time (T2+T2) are picked up so as to coincide with the top of the after-compaction rear half component B2H. The front half component, A2, is assigned to the fade-in side. The rear half component, B3, is assigned to the fade-out side.
The components A2 and B3 are subjected to compaction in the time axis using the compression ratio K1, wherein, after compaction, the front half component A2H and the rear half component B3H are obtained. The time for reproduction of components A2H and B3H is time T1 for each.
As in the above description, the after-compaction front half component A3H and the after-compaction rear half component B4H are obtained.
With respect to the after-compaction respective components thus obtained, B2H and A2H, B3H and A3H, and BnH and AnH (n: an integer) are subjected to the cross-fade process.
Also, as described above, BnH is faded-out while AnH is faded-in.
Herein, as in the first example (3-1), the cross-fade process may be carried out using all the sections of blocks, and, as in the second example (3-2), the cross-fade process may be carried out using only the part of the blocks in the vicinity of the center.
As shown in FIG. 10(b), where the pitch is lowered, the compaction of the time axis is changed to the expansion of the time axis wherein the process is identical to the case (FIG. 10(a)) of raising the pitch, except that the compaction ratio is smaller than 1.
Through the cross-fade process, noise is prevented from occurring due to non-continuation of the changing points of the time windows. In addition, the reproduction of the output signals whose pitch is shifted, occupies the same time as does reproduction of the original signals.
Referring to FIG. 8, acoustic signals, expressed as a sequence of digital signals, entering through an acoustic input terminal 807, are stored temporarily in a memory 801. Addresses for the memory 801 are produced by a reading address generating means 804. Reading address generating means produces two series (fade-in side and fade-out side) of signals which are fed to filter calculating means 802a and 802b on the basis of the designated addresses. The filter calculating means 802a and 802b compact and/or expand the read series of signals in terms of time axes in order to shift the pitch (frequency) thereof.
A cross-fade means 803 cross-fades two series of signals, which have been compacted and/or expanded in terms of the time axes. The result of cross-fading is output through an acoustic output terminal 808.
The problem in prior art 1 resides in that, in the cross-fade process, a sense of tremolo results from phase differences between the series of signals at the fade-in side and the series of signals at the fade-out side. If the phases in the two constant-amplitude series of signals happen to be coincident with each other by chance as shown in FIG. 11, since there is no change in the envelope curve (that is, a line connecting the peaks of amplitudes) of the amplitudes of output signals in the cross-fade process, no sense of tremolo occurs. However, generally, the phases of the two series of signals are not coincident with each other.
In particular, as shown in FIG. 12, if the two constant-amplitude series of signals are completely inverted, these series of signals are subjected to a relationship, in which these signals cancel each other, in the cross-fade sections. As a result, the amplitude of the output signals is smaller than the amplitude in the sections where no cross-fade process is carried out. Therefore, the amplitudes are different from each other in the sections in which no cross-fade process is carried out and the sections in which a cross-fade process is carried out. This reinforcement and cancellation is repeated over time, whereby the output signal carries a sense of tremolo that is absent from the input signals.
(Prior Art 2: Pitch Shifting Technology for Compensating for a Phase Difference)
Japanese Unexamined Patent Application No. Hei-5-297891 discloses one method for accomplishing prior art 2 that improves the problem.
In this application publication, the sense of tremolo results from the phase difference between two series of signals that are subjected to the cross-fade process. Therefore, the phase difference is obtained with respect to the two series of signals. One of the two series is shifted in the direction of the time axis equivalent to the phase difference. This matches the signal phases to eliminate or reduce the false tremolo.
In detail, the peaks of the two series of signals are obtained, and the series of signals are time shifted an amount equivalent to the difference between the peaks.
Although a detailed reason is described later, the conclusion is that this process is satisfactory to simplify voice signals for the time being, in the case of complicated acoustic signals such as music (that is, particularly those including many intensive harmonic overtone components), erroneous detection of the peaks frequently occurs. As a result, the phase matching is unsatisfactory.