The present invention relates to an audio reproducing apparatus which is capable of converting a value of an audio playback speed into a desired value and obtaining the resulting audio.
In recent years, techniques for coding audio data with high efficiency, storing coded audio data in a storage medium, or transmitting the coded audio data over communication networks, have been put into practical use and widely utilized.
As for such techniques, apparatus for reproducing audio according to MPEG (Moving Picture Experts Group) as an international standard is disclosed in Japanese Published Patent Application No. Hei. 9-73299. FIG. 19 is a block diagram showing this MPEG audio reproducing apparatus Hereinafter, a description is given of a prior art audio reproducing apparatus with reference to FIG. 19.
Referring now to FIG. 19, an MPEG audio reproducing apparatus 1 comprises a reproducing speed detecting circuit 2, an MPEG audio decoder 3, a time-scale modification circuit 4, a D/A converter 5, and an audio amplifier 6. The time-scale modification circuit 4 comprises a frame memory 34, a time-scale modification unit 35, a ring memory 32, an up down counter 33, and a read clock generating circuit 36.
An MPEG audio stream which has been coded by the MPEG audio method is input to the MPEG audio reproducing apparatus 1. The MPEG audio decoder 3 decodes the MPEG audio stream into an audio output of a digital signal. The MPEG audio method and formats are described in various kinds of references, including xe2x80x9cISO/IEC IS 11172 Part 3: Audioxe2x80x9d.
Meanwhile, speed information such as double speed and 0.5 multiple speed is input to the reproducing speed detecting circuit 2, which detects the speed information (reproducing speed) and generates a decoding clock. The decoding clock is supplied to the time-scale modification circuit 4 and the MPEG audio decoder 3. An audio signal which has been decoded by the MPEG audio decoder 3 is input to the circuit 4, where it is subjected to time-scale compression/expansion or unvoiced sound deletion/insertion based on the given speed information, whereby time-scale modification process is performed, and the resulting output is reproduced through a speaker 23.
However, in the MPEG audio coding method which performs decoding frame by frame of a prescribed time length, data processing of plural frames requires numerous buffer memories and increases complexity, which causes a large-scale hardware structure.
Another apparatus for reproducing audio according to the MPEG is disclosed in Japanese Published Patent Application No. Hei 9-81189. FIG. 20 is a block diagram showing this MPEG audio reproducing apparatus. Hereinafter, a description is given of another prior art audio reproducing apparatus with reference to FIG. 20.
Referring to FIG. 20, reference numeral 1701 designates a first frame diving unit for dividing an input subband signal 1 and holding a signal of one frame of a Tf sample length, reference numeral 1702 designates a second frame diving unit for dividing an input subband signal 2 and holding a signal of one frame of a Tf sample length, reference numeral 1703 designates a third frame diving unit for dividing an input subband signal 3 and holding a signal of one frame of a Tf sample length, and reference numeral 1704 designates a fourth frame diving unit for dividing an input subband signal 4 and holding a signal of one frame of a Tf sample length.
The input subband signals 1-4 are subband signals of four subbands divided by a filter bank which divides a normal time-scale signal into four subband signals by xc2xc downsampling. Assume that the subband signal 1 is the lowest subband signal and the subband signal 4 is the highest subband signal.
Reference numeral 1710 designates a correlation function calculating unit which calculates correlation values S(n) in an overlapping portion of n samples of first half and second half signals of a subband signal of a subband containing audio pitch components, and which detects a maximum value n of the correlation values S(n) as xe2x80x9cTcxe2x80x9d. Reference numeral 1711 designates a reproducing speed detecting unit which detects specification of a reproducing speed F by an auditor. Reference numeral 1712 designates a correlation function detection range control unit which limits a correlation function detection range. Reference numeral 1705 designates a first cross fading unit which performs cross fading process to overlapped Tc samples of the first half and second half signals of the subband signal divided and held by the first frame dividing unit 1701. Reference numeral 1706 designates a second cross fading unit which performs cross fading process to overlapped Tc samples of the first half and second half signals of the subband signal divided and held by the second frame dividing unit 1702. Reference numeral 1707 designates a third cross fading unit which performs cross fading process to overlapped Tc samples of the first half and second half signals of the subband signal divided and held by the third frame dividing unit 1703. Reference numeral 1708 designates a fourth cross fading unit which performs cross fading process to overlapped Tc samples of the first half and second half signals of the subband signal divided and held by the fourth frame dividing unit 1704. Reference numeral 1709 designates a synthesizing filterbank which synthesizes subband signals of four subbands which have been subjected to cross fading process.
FIG. 21 is a diagram showing time-scale waveform of one frame of a frequency band which contains main pitch components of an audio signal.
FIG. 22 is a diagram showing two segments of the first half and second half signals into which one frame signal in FIG. 21 has been divided, as upper and lower segments.
FIG. 23 is a graph showing values of a correlation function between the two segments in FIG. 22.
FIG. 24 is a diagram qualitatively showing a state in which the segment of the second half signal component is shifted to a time when the correlation function takes the maximum value.
FIGS. 25(a)-25(c) are diagrams showing a case where cross fading process is performed with two segments overlapped for a Tc time period.
Subsequently, a description is given of operation of the reproducing apparatus so constructed with reference to FIGS. 21 through 25(a)-25(c).
First of all, suppose that data of one frame (Tf sample length) of the input subband signal 1 includes main pitch components of the audio signal as shown in FIG. 21. The one frame data is divided into two segments which are equal in the number of data as shown in FIG. 22 and held by the first frame dividing unit 1701. In a like manner, the subband signals 2, 3, and 4 are respectively divided into two segments and held by the second, third, and fourth frame dividing units 1702, 1703, and 1704, respectively.
Then, from a target speed rate F obtained by the reproducing speed detecting unit 2, a data length of an overlapping portion of the two segments, i.e., a target overlapping value Tb is found according to the following equation:
Tb=Tfxc2x7(1xe2x88x921F)
Considering a correction parameter B (initialization value=0) for correcting deviation from the target speed rate F due to phase adjustment mentioned later, the correlation function calculating unit 1710 calculates correlation in a range of m samples before and m samples after an overlapping interval data length (Tb+B) of two segments in the first frame dividing apparatus 1701, to find an overlapping interval length Tc where the correlation function takes the maximum value. Then, to correct the error between the target speed rate F and an actual speed rate resulting from difference between Tc and Tb, a value of the correction parameter B is updated as follows:
B ←B+Tbxe2x88x92Tc
In FIG. 22, there is shown a case where two upper and lower segments are disposed separately, by setting the target speed rate F to be 2.0 and the target overlapping value to be Tb(=Tf/2). Shown in FIG. 23 is a correlation function of these two segments. As it can be seen from the graph, in the example shown, the correlation function takes the maximum value at time xe2x80x9c4xe2x80x9d. In FIGS. 24(a) and 24(b), two segments are shown with the overlapping length xe2x80x9cTcxe2x80x9d, according to the correlation function. More specifically, a degree of similarity between the first half and second half segments is found by the use of the correlation function, and then the second half segment is shifted to the high correlation position, resulting in a match between phases of the two segments. In this case, the overlapping interval length is xe2x80x9cTcxe2x80x9d.
Subsequently, the first cross fading unit 1705 performs cross fading to the subband signals of two segments divided and held by the first frame dividing unit 1701 with the xe2x80x9cTcxe2x80x9d overlapped. In a like manner, the second cross fading unit 1706, the third cross fading unit 1707, and the fourth cross fading unit 1708 perform cross fading to the subband signals of two segments divided and held by the second frame dividing unit 1702, the third frame dividing unit 1703, and the fourth frame dividing unit 1704, respectively, with the xe2x80x9cTcxe2x80x9d overlapped. FIGS. 25(a)-25(c) show an example of this cross fading process. In this cross fading process, to the overlapping portion of two segments, addition is performed by complementary weighting. Shown in FIG. 25(a) is a signal in which the first half segment has been subjected to fading-out process. Shown in FIG. 25(b) is a signal in which the second half segment has been subjected to fading-in process. The signals in FIG. 25(a) and in FIG. 25(b) are added, resulting in a waveform shown in FIG. 25(c).
Thereafter, the synthesizing filterbank 1709 synthesizes respective subband signals so cross-faded, to produce the normal time-scale signal.
The above process is serially performed to signals of respective subbands for all the frames each comprising Tf samples, thereby performing high-speed reproduction which is completed by processing data in one frame.
However, there have been problems associated with the reproducing apparatus so constructed, which will be described below.
Here, it is assumed that a standard MPEG1 audio coding method is employed, and the number of divided subbands, the number of data of one frame of each subband, an initialization value of the correction parameter B, and a correction search width m as a reference are xe2x80x9c32xe2x80x9d, xe2x80x9c36xe2x80x9d, xe2x80x9c0xe2x80x9d, and xe2x80x9c4xe2x80x9d, respectively. Actual overlapping values and the points of correlation search, are found by the method illustrated in the prior art example. The calculation results are shown, in which decimal points are truncated.
First, a case where the speed rate is close to xe2x80x9c1.0xe2x80x9d will be discussed. Since the target overlapping value is small, the overlapping value is in the small range. In this case, the problem is that the cross fading length is too small. Although high correlation is found and cross fading process is carried out, and if the transition period in two segments including the cross fading interval is too short, cross fading has little effects on improvement of continuity , so that waveform of a low-frequency signal in the segments rapidly changes. As a result of this, reproduced audio with discontinuity is obtained. Evaluation experiments on the cross fading interval length, the correlation retrieval width, and audio quality is, for example, described in xe2x80x9cInstitute of Electronics, Information and Communication Engineers (SP90-34, 1990.8)xe2x80x9d by Suzuki and Misaki, which illustrates an optimum value for PCM (pulse coded modulation) audio.
Next, a case where the speed rate is close to xe2x80x9c2.0xe2x80x9d will be discussed. As can be seen from the table 1, the target overlapping value is approximately 18, i.e., an upper limit, the upper limit of the overlapping value does not exceed one segment length, and the points of correlation search indicates satisfactory number. In case of the speed rate xe2x80x9c2.0xe2x80x9d, if the overlapping value takes a value smaller than the target value xe2x80x9c18xe2x80x9d, since there is no possibility that this will be corrected later, a fixed overlapping value must be taken without correlation search in order to achieve the target speed. In addition, if the search width m takes a larger value so as to increase the points of correlation search, the correction parameter B takes a positive value when an overlapping value is smaller than the target overlapping value, and therefore an overlapping value (Tb+B) of subsequent correlation search exceeds one segment length ((Tb+B) greater than Tf/2), which makes it difficult to correct the speed rate. For this reason, it is required that the search width m take a smaller value, and correspondingly the points of correlation retrieval becomes fewer. Therefore, cross fading process is performed without satisfactorily improved phase matching. As a result, a hoarse voice due to phase mismatching is obtained.
Thus, use of this algorithm leads to operation under unsatisfactory conditions for phase adjustment according to the correlation function, in which case, high performance is not obtained.
Further, even in a range in which approximately 1.5 speed rates fall, since all the given frames are subjected to cross fading process, distortion due to processing occurs in all the frames, so that considerable degradation is felt by auditors.
From the foregoing description, one disadvantage of the illustrated example is that a method for improving phase matching according to the correlation function does not work satisfactorily and has difficulty in converging into the target speed rate.
Another disadvantage of the illustrated example is that it provides high-speed reproduction, but does not provide low-speed reproducing function.
It is an object of the present invention to provide an audio reproducing apparatus which realizes time-scale modified audio with high/low speed and of high quality, with a simple construction based on time-scale compression/expansion at a prescribed speed rate which is completed by processing data in frames.
Other objects and advantages of the invention will become apparent from the detailed description that follows. The detailed description and specific embodiments described are provided only for illustration since various additions and modifications within the spirit and scope of the invention will be apparent to those skill in the art from the detailed description.
According to a first aspect of the present invention, an audio reproducing apparatus comprises audio decoding means for decoding an input audio signal frame by frame; data expanding/compressing means for subjecting data in a decoded frame to time-scale modification process; a frame sequence table which contains a sequence determined according to a given speed rate in which respective frames are to be expanded/compressed; frame counting means for counting the number of frames of the input audio signal; and data expansion/compression control means for instructing the data expanding/compressing means to subject the frame to one of time-scale compression process, time-scale expansion process, and process without time-scale modification process, with reference to the frame sequence table based on a count value output from the frame counting means, the data expanding/compressing means subjecting the audio signal to time-scale modification process in accordance with an instruction signal from the data expansion/compression control means. Therefore, it is possible to provide an audio reproducing apparatus which realizes time-scale modification process of high quality at a desired speed rate (reproducing rate), with a simple construction in which time-scale compression/expansion process at a fixed speed rate which is completed by processing data in frames is performed.
According to a second embodiment of the present invention, in the audio reproducing apparatus of the first aspect, the data expanding/compressing means includes cross fading means for dividing each frame of the input audio signal into at least two segments and performing weighting addition to waveform data of each segment. Therefore, it is possible to provide an audio reproducing apparatus which realizes time-scale modification process of high quality at a desired speed rate (reproducing rate), with a simple construction in which time-scale compression/expansion process at a fixed speed rate which is completed by processing data in frames is performed.
According to a third aspect of the present invention, in the audio reproducing apparatus of the first aspect, the data expanding/compressing means subjects a frame to time-scale compression/expansion process in a prescribed ratio, and the data expansion/compression control means controls frequency at which frames to be subjected to time-scale compression/expansion process and frames to be output without time-scale modification process appear, to reproduce audio at the given speed rate. Therefore, it is possible to provide an audio reproducing apparatus which realizes time-scale modification process of high quality at a desired speed rate (reproducing rate), with a simple construction in which time-scale compression/expansion process at a fixed speed rate which is completed by processing data in frames is performed.
According to a fourth aspect of the present invention, in the audio reproducing apparatus of the third aspect, the data expanding/compressing means subjects the frame to time-scale compression/expansion process in a prescribed ratio, and the frame sequence table contains the sequence in which frames to be subjected to time-scale compression/expansion process in the frame cycle in which a time-scale compression/expansion sequence is repeated are disposed as uniformly as possible, to reproduce audio at the given speed rate. Therefore, it is possible to provide an audio reproducing apparatus which realizes time-scale modification process of high quality at a desired speed rate (reproducing rate), with a simple construction in which time-scale compression/expansion process at a fixed speed rate which is completed by processing data in frames is performed.
According to a fifth aspect of the present invention, an audio reproducing apparatus comprises audio decoding means for decoding an input audio signal frame by frame; data expanding/compressing means for subjecting data in a decoded frame to time-scale modification process; expansion/compression frequency control means for setting a frame cycle number and the number of frames to be expanded/compressed in the frame cycle according to a given speed rate; energy calculating means for calculating energies of audio signals in respective frames; frame selecting means for selecting frames to be expanded/compressed according to an output of the energy calculating means and an output of the expansion/compression frequency control means; and data expansion/compression control means for instructing the data expanding/compressing means to subject the frame to one of time-scale compression process, time-scale expansion process, and process without time-scale modification process, the frame selecting means selecting low-energy frames with priority. Since distortion resulting from performing time-scale compression/expansion process to low-energy frames is hardly detected, time-scale modified audio of high quality is obtained.
According to a sixth aspect of the present invention, an audio reproducing apparatus comprises audio decoding means for decoding an input audio signal frame by frame; data expanding/compressing means for subjecting data in a decoded frame to time-scale modification process; expansion/compression frequency control means for setting a frame cycle number and the number of frames to be expanded/compressed in the frame cycle according to a given speed rate; means for calculating probabilities that respective frames contain humane voice; frame selecting means for selecting frames to be expanded/compressed according to an output of the calculating means and an output of the expansion/compression frequency control means; and data expansion/compression control means for instructing the data expanding/compressing means to subject the frame to one of time-scale compression process, time-scale expansion process, and process without time-scale modification process, the frame selecting means selecting low-probability frames with priority. Since distortion resulting from time-scale expansion/compression process to frames which contain no voice information is hardly detected, time-scale modified audio of high quality is obtained.
According to a seventh aspect of the present invention, an audio reproducing apparatus comprises audio decoding means for decoding an input audio signal frame by frame; data expanding/compressing means for subjecting data in a decoded frame to time-scale modification process; expansion/compression frequency control means for setting a frame cycle number and the number of frames to be expanded/compressed in the frame cycle according to a given speed rate; stationarity calculating means for calculating stationarities of audio signals in respective frames; frame selecting means for selecting frames to be expanded/compressed according to an output of the stationarity calculating means and an output of the expansion/compression frequency control means; and data expansion/compression control means for instructing the data expanding/compressing means to subject the frame to one of time-scale compression process, time-scale expansion process, and process without time-scale modification process, the frame selecting means selecting high-stationarity frames with priority. Since distortion resulting from weighting addition to high-stationarity frames is hardly detected, time-scale modified audio of high quality is obtained.
According to an eighth aspect of the present invention, an audio reproducing apparatus comprises audio decoding means for decoding an input audio signal frame by frame; data expanding/compressing means for subjecting data in a decoded frame to time-scale modification process; expansion/compression frequency control means for setting a frame cycle number and the number of frames to be expanded/compressed in the frame cycle, according to a given speed rate; means for calculating degrees of energy change of audio signals in respective frames; frame selecting means for selecting frames to be expanded/compressed according to an output of the means for calculating means and an output of the expansion/compression frequency control means; and data expansion/compression control means for instructing the data expanding/compressing means to subject the frame to one of time-scale compression process, time-scale expansion process, and process without time-scale modification process, the frame selecting means selecting frames with priority in which distortion is hardly detected because of masking effects, according to the degrees of energy change. Since distortion is hardly detected because of masking effects, time-scale modified audio of high quality is obtained.
According to a ninth aspect of the present invention, an audio reproducing apparatus comprises audio decoding means for decoding an input audio signal frame by frame; data expanding/compressing means for subjecting data in a decoded frame to time-scale modification process; expansion/compression frequency control means for setting a frame cycle number and the number of frames to be expanded/compressed in the frame cycle, according to a given speed rate; at least two of energy calculating means for calculating energies of audio signals in respective frames, means for calculating probabilities that respective frames contain humane voice, stationarity calculating means for calculating stationarities of audio signals in respective frames, and means for calculating degrees of energy change of audio signals in respective frames; frame selecting means for selecting frames to be expanded/compressed according to outputs of plural calculating means and an output of the expansion/compression frequency control means; and data expansion/compression control means for instructing the data expanding/compressing means to subject the frame to one of time-scale compression process, time-scale expansion process, and process without time-scale modification process, the frame selecting means deciding frames to be selected according to the outputs of the plural calculating means. Therefore, users can select a reproducing method which considers naturalness or reproducing method which considers intelligibility. As a result, time-scale modified audio of high quality is obtained on demand.
According to a tenth aspect of the present invention, in the audio reproducing apparatus of the first to ninth aspects, the audio decoding means for performing decoding frame by frame divides an audio signal into plural subband signals and performs decoding for each of the divided subbands. Therefore, the effects as provided by one of the first to ninth aspects are obtained.
According to an eleventh aspect of the present invention, in the audio reproducing apparatus of the first to tenth aspects, the audio decoding means for performing decoding frame by frame decodes data coded by an MPEG1 audio layer 2 coding method. Therefore, the data coded by the MPEG audio coding method is time-scale modified with less distortion.
According to a twelfth aspect of the present invention, in the audio reproducing apparatus of the fifth aspect, the audio decoding means for performing decoding frame by frame decodes data coded by an MPEG1 audio layer 2 coding method, and the energy calculating means estimates an energy of an audio signal based on a scalefactor index indicating a scalefactor at reproduction. Therefore, the data coded by the MPEG audio coding method is time-scale modified with less distortion.
According to a thirteenth aspect of the present invention, in the audio reproducing apparatus of the seventh aspect, the audio decoding means for performing decoding frame by frame decodes data coded by an MPEG1 audio layer 2 coding method, and the stationarity calculating means estimates a stationarity of an audio signal based on scalefactor selection information indicating waveform statiorarity. Therefore, the data coded by the MPEG audio coding method is time-scale modified with less distortion.
According to a fourteenth aspect of the present invention, in the audio reproducing apparatus of the eighth aspect, the audio decoding means for performing decoding frame by frame decodes data coded by an MPEG1 audio layer 2 coding method, and the means for calculating degrees of energy change estimates a degree of energy change of an audio signal based on a scalefactor index indicating a scalefactor at reproduction. Therefore, the data coded by the MPEG audio coding method is time-scale modified with less distortion.
According to a fifteenth aspect of the present invention, in the audio reproducing apparatus of the ninth aspect, the audio decoding means for performing decoding frame by frame decodes data coded by an MPEG1 audio layer 2 coding method, and the apparatus further comprises at least two of the energy calculating means, stationarity calculating means, and the means for calculating degrees of energy change wherein, the energy calculating means estimates an energy of an audio signal based on a scalefactor index indicating a scalefactor at reproduction, the stationarity calculating means estimates a stationarity of an audio signal based on scalefactor selection information indicating waveform stationarity, and the means for calculating degrees of energy change estimates a degree of energy change of an audio signal based on a scalefactor index indicating a scalefactor at reproduction. Therefore, the data coded by the MPEG audio coding method is time-scale modified with less distortions.
According to a sixteenth aspect of the present invention, in the audio reproducing apparatus of the first to fifteenth aspects, the data expanding/compressing means includes correlation calculating means for calculating correlation between segments in each frame, and a position at which the correlation is high, and sending shift amount by which waveform data of a segment is shifted to the position, the cross fading means shifts the waveform data of the segment according to the shift amount, and performs weighting addition to each segment data, and for a subsequent frame to be subjected to time-scale compression/expansion, segment data is shifted and subjected to weighting addition, considering the shift amount of a frame which has been previously subjected to time-scale compression/expansion. Therefore, waveform data is shifted to high correlation position, and time-scale compression/expansion is performed considering the shift amount.
According to a seventeenth aspect of the present invention, in the audio reproducing apparatus of the first aspect, the data expanding/compressing means includes correlation calculating means for finding correlation between segments in each frame, the audio decoding means for performing decoding frame by frame divides an audio signal into plural subband signals and performs decoding for each subband, and the correlation calculating means finds correlation between the segments by the use of data of a subband which contains pitch frequency of an audio signal. Since data of a subband which contains a pitch frequency of an audio signal is found, an audio signal is time-scale modified with less distortion.
According to an eighteenth aspect of the present invention, in the audio reproducing apparatus of the sixteenth aspect, the audio decoding means for performing decoding frame by frame divides an audio signal into plural subband signals and performs decoding for each of the divided subbands, and the correlation calculating means calculates a correlation value for each subband, and weighting addition is performed by using the shift amount of a subband which has the largest correlation value. This correlation operation allows time-scale modification process with less distortion.
According to a nineteenth aspect of the present invention, the audio reproducing apparatus of the sixteenth aspect, the audio decoding means for performing decoding frame by frame divides an audio signal into plural subband signals and performs decoding for each of the divided subbands, and the correlation calculating means calculates correlation for a subband of the divided subbands which has the highest energy. This correlation operation allows time-scale modification process with less distortion.
According to a twentieth aspect of the present invention, in reproducing apparatus as defined in one of the first to fourth aspects or as defined in one of the sixteenth to nineteenth aspects, the frame sequence table includes plural sequence tables having different patterns per one speed rate, the data expanding/compressing means finds an average of correlation values between segments in respective frames to be expanded/compressed for each sequence table, and performs processing with reference to a sequence table in which the average is the largest. Therefore, this expansion/compression process allows time-scale modification process with less distortion.