1. Field of the Invention
This invention relates to a time-axis compression/expansion method and apparatus for performing time-axis compression/expansion on original digital signals at a desired compression/expansion rate without changing the pitch of the original digital signals, and more particularly to a time-axis compression/expansion method and apparatus of this kind which is suitable for processing multichannel signals.
2. Prior Art
The time-axis compression/expansion technique for time-axis compressing or time axis-expanding a digital audio signal without changing the pitch of the same is utilized e.g. for so-called xe2x80x9ctime length adjustmentxe2x80x9d for adjusting a total recording time period over which the digital audio signal is to be recorded to a predetermined time period, tempo conversion in a karaoke apparatus or the like, and so forth. Conventionally, this kind of time-axis compression/expansion technique includes a cut-and-splice method (as disclosed e.g. in Japanese Laid-Open Patent Publication (Kokai) No. 10-282963), an overlap-add method based on pointer shift amount control (Morita and Itakura, xe2x80x9cExpansion/Compression of Sound in Time Product by Using Overlap-Add Method Based on Point Shift Amount Control and Its Evaluationxe2x80x9d, Lectures at the Autumn Conference of the Acoustical Society of Japan Vol. 1-4-14, p. 149, October, 1986), etc.
Time-axis compression/expansion processing by a general cut-and-splice method is performed such that waveform segments are cut out without considering correlation between the waveform segments and then the cut-out waveform segments are spliced together to thereby effect compression/expansion based on a specified compression/expansion rate. According to this method, discontinuities can occur in spliced portions of the cut-out waveform segments, and therefore cross-fading is carried out to smooth the spliced portions of the cut-out waveform segments. The time interval of the waveform cutout is set to such a time period that the human ears cannot sense an echo or doubling of sounds, e.g. approximately 60 msec. Particularly, according to the method disclosed in Japanese Laid-Open Patent Publication (Kokai) No. 10-282963, the cutout length or length of the cutout waveform segment is determined in synchronism with sound timing information. This method is distinguished from other conventional methods in that spliced portions appear at the same repetition period as that of the rhythm of the original waveform, so that tone changes at the spliced portions cannot be easily perceived. Cross-fading between waveform segments which are largely different in phase from each other markedly degrades the tone quality. Therefore, the present assignee has proposed a phase-matching type cut-and-splice method in which cut-out waveform segments which are closest in phase to each other are detected and are then subjected to cross-fading.
On the other hand, the overlap-add method based on pointer shift amount control is performed such that two adjacent segments of the original audio signal closely correlated in waveform and equal in length to each other are extracted, and the two signal segments are overlapped or added together. Then, the two original signal segments are replaced by a new signal segment obtained by the overlapping/addition, or the new signal segment is inserted between the two original signal segments, whereby the total time of the original audio signal is reduced or increased. This method enables smoother splicing of waveforms than the cut-and-splice method. Particularly, this method can achieve higher-quality time-axis compression/expansion of pitch-based sound source signals, such as voice signals and sound signals generated by monophonous musical instruments.
However, the conventional phase-matching type cut-and-splice method and overlap-add method based on pointer shift amount control only deal with monophonic signals. If these methods, which select signal segments identical in phase or signal segments closely correlated in waveform to each other for cross-fading, are directly applied to processing of stereo signals, it may provide an odd auditory localization for the listener, which forms a serious problem. This results from the fact that left-channel and right-channel signals are processed as separate monophonic signals independent from each other so that a disagreement occurs between the cross-faded portions of the signals of the respective channels, causing a difference in phase between tones sensed by the two ears that determines the auditory localization of the stereo signal.
Aside from the time-axis compression/expansion apparatus, there have been proposed pitch conversion devices that perform processing for changing the readout ratio by using the cut-and-splice method (Japanese Laid-Open Patent Publication (Kokai) No. 5-297891). According to one of the devices, pitch conversion of left-channel and right-channel signals of a stereo signal is performed such that portions of the left-channel signal most closely correlated to each other are cut out and spliced together by cross-fading, and then portions of the right-channel signal close to the edited point of the left-channel signal and most closely correlated to each other are cut out and spliced together by cross-fading. According to another device, the pitch conversion is performed such that the editing method is switched, as required, according to the correlation between the left-channel signal and the right-channel signal in such a manner that if the correlation between the two channel signals is not high, portions of each channel signal which are most closely correlated to each other are edited on a channel-by-channel basis, while if the correlation between the two channel signals is high, portions of the left-channel signal (or right-channel signal) which are most closely correlated to each other and portions of the other channel signal corresponding to the portions of the left-channel signal (or right-channel signal) are both edited.
However, these proposed devices had the disadvantage that cross-fading is not fully synchronized between the left and right channel signals, which may cause a difference in phase between tones sensed by the two ears and hence provide an odd auditory localization for the listener. Such a transient odd auditory localization that is sensed is generally more conspicuous to the ears than improper splicing of waveform segments by cross-fading, which forms a problem to be solved.
It is an object of the present invention to provide a time-axis compression/expansion method and apparatus for multichannel signals, which is capable of performing time-axis compression/expansion on a multichannel signal without causing a disagreement between cross-fading points of the channels of the multichannel signal, to thereby ensure that a normal auditory localization is provided for the listener.
To attain the above object, according to a first aspect of the present invention, there is provided a time-axis compression/expansion method for time-axis compressing/expanding a multichannel signal comprising a plurality of channel signals at a specified compression/expansion rate, which comprises the steps of sequentially cutting out waveform segments from each of the channel signals, determining a cutting starting point of a leading end portion of a waveform segment of the cut out waveform segments following each preceding waveform segement of the cut out waveform segments, commonly between the channel signals, based on two portions of a waveform of a synthesized signal formed by synthesizing the channel signals within a range of a predetermined search starting point to a predetermined search ending point of the waveform of the synthesized signal, the two portions corresponding to a time period over which cross-fading is to be carried out and being most similar to each other, and splicing together the preceding waveform segment and the following waveform segment cut from each of the channel signals based on the determined cutting starting point, by cross-fading a trailing end portion of the preceding waveform segment and the leading end portion of the following waveform segment.
According to this method, when cut-out waveform segments are to be spliced together by cross-fading, a cutting starting point of a waveform segment following each preceding waveform segment is determined based on a synthesized signal formed by synthesizing all the channel signals constituting the multichannel signal, and waveform segments are sequentially cut out from respective channel signals based on the cutting starting point thus determined, and spliced together by cross-fading. Therefore, the cutting starting point can be made identical between all the channel signals, and at the same time, set to an averaged point of the optimum cutting starting points of all the channel signals (when one channel is dominant, it is set to a point mostly dependent on the dominant channel). Therefore, it is possible to carry out time-axis compression/expansion without degrading tone quality at the spliced portions of waveform segments, and at the same time preventing displacement of cross-faded portions between the channel signals, thereby ensuring a natural auditory localization for the listener.
Preferably, the cutting starting point of each of the channel signal corresponds to a starting point of a following one of the two portions of the waveform of the synthesized signal which are most similar to each other.
Preferably, the length of each of the waveform segments to be cut out from each of the channel signals is set according to the specified compression/expansion rate.
Preferably, as the specified compression/expansion rate is farther from a value of xe2x80x9c1xe2x80x9d, the time period over which the cross-fading is to be carried out is set to a longer time period.
Preferably, a frequency of calculating a degree of similarity of the two portions of the waveform of the synthesized signal is set according to the time period over which the cross-fading is to be carried out.
To attain the above object, according to a second aspect of the present invention, there is provided a time-axis compression/expansion apparatus for time-axis compressing/expanding a multichannel signal formed of a plurality of channel signals at a specified compression/expansion rate, which comprises a plurality of waveform segment-cutting sections that each sequentially cut out waveform segments from each of the channel signals, a cutting starting point-determining section that determines a cutting starting point of a leading end portion of a waveform segment of the cut out waveform segments following each preceding waveform segment of the cut out waveform segments, commonly between the channel signals, based on two portions of a waveform of a synthesized signal formed by synthesizing the channel signals within a range of a predetermined search starting point to a predetermined search ending point of the waveform of the synthesized signal, the two portions corresponding to a time period over which cross-fading is to be carried out and being most similar to each other, and a splicing section that splices together the preceding waveform segment and the following waveform segment cut from each of the channel signals based on the determined cutting starting point, by cross-fading a trailing end portion of the preceding waveform segment and the leading end portion of the following waveform segment.
The time-axis compression/expansion apparatus according to the second aspect of the invention can provide substantially the same effects as described as to the time-axis compression/expansion method according to the first aspect of the invention.
To attain the above object, according to a third aspect of the invention, there is provided a storage medium storing a program which can be executed by a computer, for realizing a time-axis compression/expansion method for time-axis compressing/expanding a multichannel signal formed of a plurality of channel signals at a specified compression/expansion rate, the program comprising a waveform segment-cutting module that sequentially cut out waveform segments from each of the channel signals, a cutting starting point-determining module that determines a cutting starting point of a leading end portion of a waveform segment of the cut out waveform segments following each preceding waveform segment of the cut out waveform segments, commonly between the channel signals, based on two portions of a waveform of a synthesized signal formed by synthesizing the channel signals within a range of a predetermined search starting point to a predetermined search ending point of the waveform of the synthesized signal, the two portions corresponding to a time period over which cross-fading is to be carried out and being most similar to each other, and a splicing module that splices together the preceding waveform segment and the following waveform segment cut from each of the channel signals based on the determined cutting starting point, by cross-fading a trailing end portion of the preceding waveform segment and the leading end portion of the following waveform segment.
The storage medium according to the third aspect of the invention can provide substantially the same effects as described above as to the time-axis compression/expansion method according to the first aspect of the invention.
The above and other objects, features, and advantages of the invention will become apparent from the following detailed description taken in conjunction with the accompanying drawings.