1. Field of the Invention
The present invention relates to a signal processing method and apparatus in which a coded signal is decoded and its pitch is shifted, and an information-serving medium for serving a program which implements the signal decoding and pitch shifting.
2. Description of the Related Art
There has been known a technique for shifting the interval (pitch) of a sound signal by re-sampling the sound signal recorded in a pulse code-modulated (PCM) state at intervals different from those at which the sound signal has been sampled for pulse code compression (PCM). For example, a sound one octave lower than an original sound signal can be reproduced by reproducing, as sample values acquired at the original sampling rate, a two times larger number of sample values than that of the original sound signal sample values, acquired by sampling at a sampling rate two times higher than the original sampling rate within the same unit time as that for the original sound signal, while interpolating the difference between the original sound signal sample values, or by reproducing at the original sampling rate each of the samples acquired by re-sampling, by which the number original sound signal samples is halved. However, when a sound having a higher pitch than the original sound is reproduced (namely, the sound pitch is raised), so-called aliasing will take place. To avoid this, it is necessary to pass a signal yet to re-sample through a low-pass filter for example. In the above example, a part of the sample after being re-sampled coincides with the original sample. However, the sample part is not always necessary. Generally, by re-sampling the sound signal at an arbitrary rate while interpolating the difference between samples, it is possible to shift the interval (namely, to control the pitch).
On the other hand, a highly efficient coding method has been proposed to compress an audio or sound data with little degradation in sound quality of the data in hearing the sound. An audio signal can be coded with a high efficiency in various manners. The highly efficient audio data coding methods include, for example, a so-called transform coding being a blocked frequency band division method in which an audio signal on a time base is blocked in predetermined time units, the time base signal in each block is transformed (spectrum-transformed) to a signal on a frequency base, the signal thus acquired is divided into a plurality of frequency bands, and the signal in each subband is coded, and a so-called subband coding (SBC) being a non-blocked frequency band division method in which an audio signal on a time base is divided into a plurality of frequency bands without blocking it, and the signal in each subband is coded.
The subband coding (SBC) uses a subband filter which is a so-called quadrature mirror filter (QMF) or the like. The QMF filter is known from the publication xe2x80x9cDigital Coding of Speech in Subbandsxe2x80x9d (R. E. Crochiere, Bell Syst. Tech. J., Vol, 55, No. 8, 1976). The QMF filter is characterized in that when two bands having the same bandwidth are recombined later, no aliasing will take place. More specifically, there is a fact that an aliasing taking place in a signal halved, for example, for the band division and an aliasing taking place in a signal synthesized by recombining the half signals, will cancel each other. Therefore, if the signal of each subband is coded with a sufficiently high accuracy, the QMF filter can eliminate almost perfectly the loss caused by the signal coding.
Also the publication xe2x80x9cPolyphase Quadrature Filtersxe2x80x94A New Subband Coding Techniquexe2x80x9d (Joseph H. Rothweiler, ICASSP 83, Boston) describes a polyphase quadrature filters which provide an equal-bandwidth division by filters. The PQF filter is characterized in that a signal can be divided into a plurality of equal-width subbands at a time and no aliasing takes place when the signals of the subbands are recombined later. More particularly, an aliasing taking place between a signal thinned at a rate for each bandwidth and an adjoining subband and an aliasing taking place between adjoining subbands recombined later, will cancel each other. Therefore, if the signal of each subband is coded with a sufficiently high accuracy, the PQF filter can eliminate almost perfectly the loss caused by the signal coding.
Further, the spectrum transform can be effected by blocking an input audio signal for predetermined unit times (frames) and transforming a time base to a frequency base by the discrete Fourier Transform (DFT), discrete cosine transform (DCT), modified discrete cosine transform (MDCT) or the like. The MDCT is further described in the publication xe2x80x9cSubband/Transform Coding Using Filter Bank Designs Based on Time Domain Aliasing Cancellationxe2x80x9d (J. P. Princen, A. B. Bradley, Univ. of Surrey Royal Melbourne Inst. of Tech. ICASSP, 1987).
When the DFT or DCT is used for spectrum transform of a waveform signal, M pieces of independent real data can be acquired by transforming the waveform signal in time blocks each of M pieces of sample data (will be referred to as xe2x80x9ctransform blockxe2x80x9d hereinafter). Normally, for reduction of the distortion of connection between transform blocks, 1M pieces of sample data of one of transform blocks next to each other are arranged to overlap 1M pieces of sample data of the other transform block. Thus, the DFT or DCT will be able to provide M pieces of real data from a mean number (M-M1) of sample data. Therefore, the M pieces of real sample data will subsequently be quantized and coded.
On the other hand, when the MDCT is used for spectrum transform, M pieces of independent real data can be acquired from 2M pieces of samples of which M pieces at ends of adjoining transform blocks, opposite to each other, are arranged to overlap each other. More specifically, when the MDCT is employed for the spectrum transform, M pieces of read data can be acquired from a mean number M of sample data, and the M pieces of real data will subsequently be quantized and coded. In the decoder, waveform elements acquired from codes acquired using the MDCT by making an inverse transform in each block are added together while being in interference with each other to reconstruct a waveform signal.
Generally, when a transform block intended for spectrum transform is made longer, the frequency resolution will be higher and the energy will concentrate to a certain spectrum signal component. Therefore, by making a spectrum transform with a large length of adjoining transform blocks, a half of sample data in one transform block being laid to overlap a half of sample data in the other transform block, and using the MDCT in such a manner that the number of spectrum signal components thus acquired will not be larger than the number of sample data on an original time base, it is possible to code an audio signal with a higher efficiency than when the DFT or DCT is used for the same purpose. Also, by arranging adjoining transform blocks to overlap each other over a sufficiently large length thereof, it is possible to reduce the distortion of connection between transform blocks of a waveform signal. However, since the long transform blocks will lead to a necessity of more work areas for transforming, the increased length of transform blocks will be a problem to a more compact design of the reading means, etc. Especially, the longer transform blocks will lead to an increase of manufacturing costs when it is difficult to raise the degree of semiconductor integration.
As mentioned above, quantization of signal components divided into subbands by the filtration and spectrum transform makes it possible to control any band where a quantum noise takes place. Therefore, using the so-called masking effect, a high auditory efficiency can be attained.
The above-mentioned xe2x80x9cmasking effectxe2x80x9d refers to a phenomenon that a loud sound will acoustically cancel a low one. With this effect, it is possible to acoustically conceal a quantum noise behind an original signal sound. Thus, even with the signal sound compressed, a sound quality almost the same as that of the original signal can be provided in hearing a reproduced sound. In order to utilize the masking effect effectively, however, it is essential to control the occurrence of the quantum noise in the time and frequency domains. For example, when a signal including an attacking part of which the signal level abruptly becomes high next to a low signal level is blocked for coding and decoding, a quantum noise occurring due to the coding and decoding of the signal block including the attacking part will also appear in the low-level signal part before the attacking part. For example, if the duration of the low-level signal part before the attacking part is short, the low-level signal part will acoustically be concealed under the masking effect of the attacking part. For example, however, if the low-level signal part before the attacking part lasts for more than a few milliseconds in a signal block, it will be beyond the range of the masking effect of the attacking part, so that the low-level signal part will not acoustically be concealed. Then, a sound quality degradation known as xe2x80x9cpre-echoxe2x80x9d will take place, causing the sound signal to be unpleasant to hear. In this event, the length of a block for transform to a spectrum signal is changed depending upon the property of the signal in the block to prevent pre-echo from taking place, as the case may be. Note that by normalizing each sample data with the maximum one of the absolute values of signal components in each of the subbands before quantizing it, a higher efficiency of code can be attained.
Also, a bandwidth suitable for the human auditory characteristics for example should preferably be used as a frequency division width for quantization of each signal component acquired by dividing the frequency band of an audio signal for example. That is, the audio signal should preferably be divided into a plurality of subbands (25 bands) each having a bandwidth which is wider as the band frequency is higher and generally called xe2x80x9ccritical bandxe2x80x9d. For coding data of each subband at this time, a predetermined bit distribution is effected for each subband or an adaptive bit allocation is done for each subband. For example, to code a factor data acquired by the MDCT using the above-mentioned adaptive bit allocation, an MDCT factor data for each subband, acquired by the MDCT for each transform block is coded with an adaptive number of allocated bits. The bit allocation is effected by any of the two methods which will be described below.
One method is disclosed in the publication xe2x80x9cAdaptive Transform Coding of Speech Signalsxe2x80x9d (R. Zelinski and P. Noll, IEEE Transactions of Acoustics, Speech and Signal Processing, Vol. ASSP-25, No. 4, August, 1977). In this method, the bit allocation is done based on the size of a signal of each subband. The quantum noise spectrum is flat and the noise energy is minimum. However, since no acoustic masking effect is utilized in this method, the actual noise thus suppressed is not optimal.
The other method is described in the publication xe2x80x9cThe Critical Band Coderxe2x80x94Digital Encoding of the Perceptual Requirements of the Auditory Systemxe2x80x9d (M. A. Kransner, MIT, ICASSP, 1980). This method uses the acoustic masking to acquire a necessary signal to noise ratio for each subband and make a fixed bit allocation. Since the bit allocation is a fixed one, however, a sound characteristic measured with a sine wave input will not be so good.
To solve the above problems, a highly efficient coding has been proposed in which all bits usable for the bit allocation are divided into two groups for a fixed bit application pattern predetermined for each small block and a bit distribution depending upon the number of bits in each block, respectively, at a division ratio being dependent upon a signal related to an input signal, and the number of the bits for the fixed bit application pattern is increased as the pattern of the signal spectrum is smoother.
If the energy concentrates to a certain spectrum signal component as in a sine wave input, the overall signal to noise ratio can remarkably be improved by this method by allocating more bits to a block including that spectrum signal component. Generally, since the human auditory sense is extremely keen to a signal having a steep spectrum signal component, the improvement of the signal to noise ratio characteristic by this method will not lead only to a better measured S/N value but also to an improved sound quality.
Many other bit allocation methods have been proposed. If a more elaborately designed auditory sense model is available and the encoder""s ability allows, a more highly efficient coding is possible.
Generally, in these methods, a real reference value for the bit allocation is determined which realizes a signal to noise ratio determined by calculation with a fidelity as high as possible, and an integral value approximate to the reference value is taken as a number of allocated bits.
For actual code string configuration, first, quantizing accuracy information and normalization factor information should be coded with a predetermined number of bits for each subband to be normalized and quantized, and then normalized and quantized spectrum signal components should be coded. The ISO standard (ISO/IEC 11172-3:1993 (E), 1993) prescribes a highly efficient coding method in which the number of bits indicative of quantizing accuracy information is set different from one subband to another and the number of bits representing the quantizing accuracy information is set smaller for subbands of higher frequencies.
Instead of directly coding the quantizing accuracy information, quantizing accuracy information may be determined from normalization factor information, for example, in the decoder. However, this method will not be compatible with a control of the quantizing accuracy based on a more highly sophisticated auditory sense model which will be introduced in the future, since the relation between the normalization factor information and quantizing accuracy information is determined when the standard is set. Also when a compression rate has to be determined in a certain range, it is necessary to determine the relation between the normalization factor information and quantizing accuracy information for each compression rate.
Also, a method for efficiently coding quantized spectrum signal components via coding using a variable-length code is known from the disclosure in the publication xe2x80x9cA Method for Construction of Mnimum Redundancy Codesxe2x80x9d (D. A Huffman, Proc. 1. R. E., 40, p. 1098, 1952).
Further, there has been proposed in the specification and drawings of the international publication No. W094/28633 of the Applicant""s international patent application an audio signal coding method in which an acoustically most important tone component is separated from spectrum signal components and then coded separately from other spectrum signal components. By this method, an audio signal or the like can be coded efficiently with a high compression rate without little degradation of the sound quality.
Note that each of the aforementioned coding methods is applicable to each channel of an acoustic signal composed of a plurality of channels. For example, by applying the method to each of an L channel corresponding to a left-hand speaker and R channel corresponding to a right-hand speaker, a stereo audio signal can be coded with a high efficiency. Also, the coding method may be applied to a (L+R)/2 signal acquired by adding together signals of the L and R channels. Further, of the signals of the same two channels, a (L+R)/2 signal and (L-R)/2 signal may be coded efficiently by the above method. Furthermore, the Applicant of the present invention suggested, in the specification and drawings of the Japanese Patent Application No. 97-81208, a signal coding method in which the band of the (Lxe2x88x92R)/2 signal is made narrower than the (L+R)/2 signal to code an audio signal efficiently with a smaller number of bits while maintaining a stereophony of the reproduced audio sound in hearing. This method is based on the fact that the stereophony of a sound is predominantly influenced by a low frequency portion of the sound.
As in the above, methods for code with higher efficiency have been developed one after another. By adopting a standard covering a newly developed method, it is possible to record data for a longer time and record an audio signal with a higher quality than ever for the same length of recording time.
To map a time-series audio signal in the time and frequency domains for coding the signal, a highly efficient coding method has been proposed which is a combination of the previously described subband coding and transform coding. In this method, after the frequency band of an audio signal is divided into subbands by the subband coding for example, the signal of each subband is transformed in spectrum to a signal on the frequency base and each of the subbands thus spectrum-transformed is coded.
The coding by the division of signal frequency band by the subband filter, followed by the transform to spectrum signal by the MDCT or the like is advantageous as will be described below:
First, since the transform block length and the like can be set to an optimum for each subband, the occurrence of the quantum noise in the time and frequency domains can optimally be controlled for hearing to improve the sound quality.
Generally, the spectrum transform by the MDCT is effected using a high speed computation such as fast Fourier Transform (FFT) in many cases. For such a high speed computation, however, a memory area having a size proportional to the length of a block is required. However, since the number of samples for spectrum transform can be reduced for the same frequency resolution by transforming the spectrum of signals once divided into subbands and then thinned proportionally to the bandwidth for each subband, it is possible to reduce the memory area necessary for the spectrum transform.
Further, when a coded signal for example is decoded, it does not have a high sound quality. Reproduction of an audio signal by a decoder having a hardware scale as small as possible can be attained by processing only the signal data of low frequencies. Thus, this method is very convenient and usable.
Since the compression method using a method for transforming the spectrum signal by a combination of a subband filter and spectrum transform by the MDCT can be implemented by a relatively small-scale hardware, it is very convenient as a compression method for a portable recorder for example. However, since many product-sum calculations are required for implementation of the subband filter, the operations will be increased for the computation.
For acquisition of a read signal by decoding a coded signal as in the above, it is required in a computer game machine, editing equipment and other equipment for example as the case may be to decode a coded signal for example while transforming the pitch of the signal.
For reproduction of a sound higher one octave for example than an original audio signal actually coded, coded signals of all frequency bands have to be decoded at a two times higher speed. For reproduction of a two octaves higher sound, coded signals of all frequency bands have to be decoded at four times higher speed. Therefore, for acquisition of a louder sound than an original sound using the pitch shifting method, it is necessary to design the processing speed and amount of the decoder sufficiently high correspondingly to the sound pitch, which results in increased manufacturing costs of the decoder.
It is therefore an object of the present invention to overcome the above-mentioned drawbacks of the prior art by providing a signal processing method and apparatus, capable of reproducing a coded audio signal by decoding it while shifting its pitch, and reproducing, from an original sound, a sound having a desired sufficiently higher pitch than the original sound with [not many] reduced operations and with decreased costs for the decoder used in the signal processing apparatus, and an information serving medium for serving a program which implements the signal decoding and pitch shifting.
The above object can be attained in one embodiment consistent with the present invention, by providing a signal processing method for decoding a coded signal for reading, including, setting a pitch for a decoded read signal; decoding only a low frequency portion of the coded signal according to the set pitch; and shifting the pitch of the decoded read signal based on the set pitch.
The above object can also be attained in another embodiment consistent with the present invention by providing an information processing method for decoding a coded signal for reading, including, setting a pitch for a decoded read signal; decoding the coded signal with zero inserted at a high frequency portion, corresponding to the set pitch, of the coded signal; and generating a read signal having a pitch corresponding to the set pitch.
The above object can also be attained in another embodiment consistent with the present invention, by providing a signal processing apparatus for decoding a coded signal for reading, including, means for setting a pitch for a decoded read signal; means for decoding only a low frequency portion of the coded signal according to the set pitch; and means for transforming the pitch of the decoded read signal based on the set pitch.
The above object can also be attained in another embodiment consistent with the present invention, by providing a signal processing apparatus for decoding a coded signal for reading, including, means for setting a pitch for a decoded read signal; means for decoding the coded signal with zero inserted at a high frequency of the coded signal according to the set pitch; and means for generating a read signal having a pitch corresponding to the set pitch.
In the above signal processing methods and apparatuses according to another embodiment consistent with the present invention, when the coded signal is one acquired by dividing the frequency band of a signal, only the subband of a low frequency portion of the signal whose frequency band has been divided into subbands is decoded according to the set pitch. When the coded signal is one acquired by transforming a signal to frequency components and then coding it, only the low frequency one of the transformed frequency components is decoded according to the set pitch. Also in the signal processing methods and apparatuses according to one embodiment consistent with the present invention, the digital read signal whose pitch has been shifted according to the set pitch is converted to an analog read signal with a clock corresponding to the set pitch. Further, during the pitch shifting, a sampling-transformation can be done by sampling-transforming only the low frequency portion of the decoded read signal which can be sample-transformed according to the set pitch or with zero inserted at the high frequency portion of the decoded read signal. Thus, a sound having a desired sufficiently higher pitch than an original sound can be reproduced from the original sound with not many operations and with no increase of the manufacturing costs. And, a sound whose pitch has been shifted can be produced without any aliasing.
The above object can also be attained in another embodiment consistent with the present invention, by providing an information serving medium for serving a program according to which a coded signal is decoded and read, the program including, setting a pitch for a decoded read signal; decoding only a low frequency portion of the coded signal according to the set pitch; and shifting the pitch of the decoded read signal based on the set pitch.
The above object can also be attained in another embodiment consistent with the present invention, by providing an information serving medium for serving a program under which a coded signal is decoded and read, the program including, setting a pitch for a decoded read signal; decoding the coded signal with zero inserted at a high frequency portion of the coded signal according to the set pitch; and generating a read signal having a pitch corresponding to the set pitch.
With the above-mentioned information serving media according to the above mentioned embodiment consistent with the present invention, a sound having a desired sufficiently higher pitch than an original sound can be reproduced from the original sound with not many operations and with no increase of the manufacturing costs.
These objects and other objects, features and advantages of the present intention will become more apparent from the following detailed description of the preferred embodiments of the present invention when taken in conjunction with the accompanying drawings.