1. Field of the Invention
The present invention relates to a surround generation apparatus for generating a multi-channel surround signal from a two-channel stereo signal. In particular, the invention relates to a surround system for providing a favorable surround space inside a vehicle.
2. Description of the Related Art
5-ch or 5.1-ch surround systems for providing a sound field bringing a sense of realism or a surround effect in a home theater, an in-vehicle space, or the like have been widely used. Among such surround systems, relatively low-cost systems use a method for expanding a two-channel stereo signal into a multi-channel surround signal.
For example, Japanese Patent No. 3682032 discloses a technology for generating a surround signal from a two-channel stereo signal. FIG. 18 is a diagram showing a configuration of an adaptive decorrelation apparatus using a FIR filer described in Japanese Patent No. 3682032. The adaptive decorrelation apparatus includes a decorrelation filter that extracts, from an input signal X of a first channel, a signal component having a strong correlation with an input signal Y of a second channel by dividing the input signal X of the first channel by multi-stage delay processors Z−1, superimposing predetermined coefficients on the outputs of the delay processors using coefficient processors W0, W1, . . . Wk, and summing up the outputs of these coefficient processors in an adder Σ. Also, the adaptive decorrelation apparatus includes a coefficient update processor 5 that changes a characteristic of the decorrelation filter with time on the basis of an error signal e obtained from an output signal RES of the decorrelation filter and the input signal Y of the second channel, the input signal X of the first channel, and a step-size parameter for controlling the update speed of the filter coefficient. A calculator 4 generates a surround signal from a difference between the output RES from the decorrelation filter and input signal Y of the second channel.
It is known that a cross-correlation coefficient is used as one of indexes numerically indicating a sense of expansion of a surround sound. Here, a cross-correlation coefficient will be observed using the correlation between two signals as an example.
Specifically, imagine a surround sound field as shown in FIG. 19. FIG. 19 is a drawing showing an example disposition of speakers in an in-vehicle surround space. Disposed at the left and right of the front seats are front speakers FL and FR for outputting stereo signals L and R. Disposed at the left and right of the rear seats are rear speakers RL and RR for outputting surround signals SL and SR. Disposed at the midpoint of the front seats is a speaker CT for outputting a center signal C. Also, disposed at the midpoint of the rear seats is a sub-woofer (not shown) for outputting a bass signal LFE. Use of a surround system according to this embodiment allows providing improved surround sound quality inside the vehicle.
The cross-correlation coefficient between the FL and RR, which are diagonally-disposed speakers, is observed. Here, it is assumed that the cross-correlation coefficient is a numerical value in a range from −1 to 1 and that a cross-correlation coefficient “1” indicates that two signals are identical (identical phase) and a cross-correlation coefficient “0” indicates that the two signals have no relation (no correlation), and a cross-correlation coefficient “−1” indicates that the two signals have an opposite relation (opposite phase).
FIG. 20 is a graph showing a distribution of a cross-correlation coefficient shown when a piece of music is observed for approximately two minutes. The lateral axis represents the time (sec.) and the vertical axis represents the cross-correlation coefficient. In the distribution shown in FIG. 20, a-1 indicates the cross-correlation coefficient between received stereo signals L and R, and a-2 indicates the relation between the stereo signal L and an error signal eR of an ADF (adaptive filter), that is, the decorrelated surround signal SR. a-1 may also be considered as the cross-correlation coefficient between the original signals of the stereo signals L and R and may be used as a reference for comparison.
In FIG. 20, it is observed that the cross-correlation coefficient has been changed from 0.4 to 0 due also to the influence of the learning speed of the adaptive filter until about 10 seconds elapse. During a period from 10 to 30 seconds, a-1 indicates that the stereo signals L and R have a correlation of approximately 1, showing a high characteristic. On the other hand, during the same period, a-2 indicates that the cross-correlation coefficient has been approximately −0.3. That is, even if the original signals have a high correlation, the cross-correlation coefficient becomes a smaller value by performing decorrelation.
In this music, the correlation has been changed every 30 seconds. Specifically, during a period from 10 to 30 seconds, bass of an instrument has been dominant; during a period from 30 to 60 seconds, a chorus has been dominant, that is, there has been an expanding sound; during a period from 60 to 90 seconds, a vocal has been dominant; and during a period of 90 seconds and later, there has been an interaction between a vocal and a chorus, that is, the cross-correlation coefficient has significantly varied.
During a period of 30 seconds and later, the cross-correlation coefficient of a-2 has been around zero in contrast to the correlation change showed by a-1, although a slight variation is observed. That is, if surround signals SL and SR are generated from stereo signals using the technology disclosed in Japanese Patent No. 3682032, the surround signal SL and SR having a low-correlation component can be extracted stably. Also, in terms of surround, the fact that the cross-correlation coefficient has been around zero favorably indicates that a sense of expansion is always kept at the maximum in a playback sound field.
Audio coding schemes such as MP3 (MPEG-1) and AAC (MPEG-2/4) each have the stereo method and joint stereo method. A significant difference between the two methods is whether components having a high correlation, of the stereo signals L and R are considered. Specifically, in the stereo coding, the stereo signals L and R are coded in a compressed manner independently. On the other hand, in the joint stereo coding, components having a high correlation, of the stereo signals L and R are extracted and then coded in a compressed manner as joint signals. As for the stereo coding, a sense of expansion is obtained, since the signals L and R are coded in a compressed manner independently. However, the independence between the channels is increased. Therefore, there are pieces of music where the signals L and R cannot match each other's correlation change.
In such a background, there is a problem that if a decorrelation process as shown in FIG. 18 is performed after an encoded audio signal is decoded by a playback apparatus, an extracted surround signal having a low-correlation component is significantly influenced by compression caused by coding and thus becomes a distorted signal or a signal including many artifacts (artificial sounds) and having poor sound quality.
FIG. 21 is a graph showing a frequency characteristic of a surround signal having a low-correlation component extracted in the decorrelation process shown in FIG. 18. This graph is obtained by averaging data having an FFT (fast-Fourier-transform) length of 1024 points 32 times (at a sampling frequency of 44.1 kHz or 743 ms for time) with respect to each of the signals from a location of 10 seconds during a period from 10 to 30 seconds shown in FIG. 20 in an area where artifacts are particularly characteristic, performing an FFT process, and then plotting the data. In FIG. 21, b-1 shows a characteristic of a surround signal having a linear PCM signal as an input. b-2 shows a characteristic of a surround signal using a signal coded using MP3 format and the joint stereo method and having a bit rate of 128 kbps as an input. b-3 shows a characteristic of a surround signal having a signal coded using MP3 format and the stereo method and having a bit rate of 128 kbps as an input.
Pay attention to a frequency range of 200 Hz to 1 kHz including a large amount of music information. From FIG. 21, it is understood that b-2 using the joint stereo method maintains a characteristic very similar to a characteristic of a non-compression linear PCM shown by b-1. On the other hand, deviations are observed in b-3 using the stereo method compared with a linear PCM shown by b-1 and it can be concluded that these deviations are artifacts caused by compression.
FIG. 22 shows a comparison between a result of a surround algorithm based on stereo signals L-R and R-L and the stereo method. The lateral axis represents the frequency and the vertical axis represents the amplitude (dB). The observation periods are the same as those shown in FIG. 21. In FIG. 22, c-1 shows a characteristic of a surround signal that has, as an input, a signal coded using MP3 format and the stereo method and having a bit rate of 128 kbps and that has undergone the decorrelation process shown in FIG. 18. c-2 shows a characteristic of a surround signal that has, as an input, a signal coded using MP3 format and the stereo coding and having a bit rate of 128 kbps and that has undergone a process of a stereo signal L-R. For convenience, an HPF having a cutoff frequency of 200 Hz is used in c-1. Therefore, if c-1 and c-2 are compared except for low frequencies thereof, they have similar characteristics. Accordingly, even if the surround algorithm based on the L-R and R-L is used, artifacts caused by compression-coding are remarkable as well. As described in Japanese Patent No. 3682032, the decorrelation method shown in FIG. 18 is better as a surround algorithm; however, artifacts caused by compression remain.
Also, the existence of artifacts will be examined from another point of view. Specifically, an increase or a reduction in the number of artifacts made by changing the bit rate for compression will be observed. FIG. 23 is a graph showing a frequency characteristic of a surround signal having a low-correlation component extracted in the decorrelation process shown in FIG. 18. The observation periods are the same as those shown in FIG. 21. In FIG. 23, d-1 shows a characteristic of a surround signal having a linear PCM signal as an input (same as b-1 shown in FIG. 21). d-2 shows a characteristic of a surround signal having, as an input, a signal coded using AAC format the stereo coding and having a bit rate of 256 kbps. d-3 shows a characteristic of a surround signal having, as an input, a signal using AAC format and the stereo coding and having a bit rate of 128 kbps.
Pay attention to a frequency range of 200 Hz to 1 kHz including a large amount of music information. From FIG. 23, it is understood that d-2 and d-3 have more deviations than d-1, since d-2 and d-3 use the stereo method. Observe the graph more precisely. No significant difference can be seen between the bit rates in a range from 200 to 500 Hz. On the other hand, it is understood that the surround signal having the higher bit rate is closer to d-1 in a range from 500 Hz to 1 kHz.