1. Field of the Invention
The present invention relates to a method of processing audio signals when audio signals are encoded by a sub-band coding system to transmit voice sound efficiently and to a method of processing audio signals in a sub-band coding system whereby such a method is suitable for use with general communication apparatuses, such as a video telephone conference system or the like which can be realized by using the above-mentioned process.
2. Prior Art
A 64K bit/s PCM coding system, which at the present time is the most commonly used voice coding system, is designed to encode voice sounds or acoustic signals in a frequency band of between 300 Hz and 3.4 kHz. With the implementation of this voice encoding technology, component technology, such as a voice detector, a voice level controller, an echo canceller, and voice communication apparatuses, such as an audio teleconference apparatus or the like, in which those technologies are used, have come to be realized by using digital signal processing. As an example of that processing, digital audio signal processing in an audio teleconference apparatus will be explained below.
In voice conference apparatuses, generally, an N-1 addition system is adopted in which the voices of all the subscribers who participate in a conference are connected to a conference trunk apparatus, and then one subscriber receives the added voices of the remaining subscribers.
The N-1 addition system is a system in which after all the signals sent from each subscriber have been added, the signal sent from a particular subscriber is subtracted from it, and the resulting signal is distributed, as a reception signal, to such a subscriber. The name of the system is derived from the fact that when the total number of subscribers is N, the number of signals added as reception signals to each subscriber becomes N-1.
FIG. 1 is a block diagram showing an example of an audio teleconference apparatus which uses the N-1 addition system. This figure is disclosed, as a known technology, in Japanese Patent Laid-Open No. 63-123257 and shows a case of four (4) subscribers, i.e., N=4. The N-1 addition system will now be explained in detail with reference to FIG. 1.
Input codes A, B, C, and D from four subscribers are sequentially converted by an expander 1 into linear codes a, b, c, and d and added by the adder 2 to the output from a shift register 3. The codes from the expander 1 are also supplied to a shift register 4. The addition output from the adder 2 is supplied to the shift register 3 and also to a subtracter 5 where the output from the shift register 4 is subtracted. The subtraction output is supplied to a compressor 7 via an attenuator 6, and converted into PCM codes and sent out.
In the case of a conversation involving 4 subscribers (A, B, C, and D), input code A is first converted into code a by the expander 1 and stacked in the shift registers 3 and 4. The next input code B is similarly converted into code b by the expander 1. Then, it is added by the adder 2 with data 1 stacked in the shift register 3, and then transferred to the shift register 3 again. Concurrently, a code-converted output is also transferred from the expander 1 to the shift register 4 where the output is stacked successively on the data a which has already been stacked.
When operations described above are performed on input codes C and D, the sum of A, B, C, and D (a+b+c+d) is stacked in the shift register 3. The codes (a, b, c, and d) are successively stacked in this order in the shift register 4.
Lastly, when data of the content (a, b, c, and d) of the shift register 4 is sequentially subtracted from the content (a+b+c+d) of the shift register 3 by the subtracter 5, appropriate three-way talk is synthesized. Since these data take the form of simple addition, they may go out of range. Therefore, to prevent this and confine the data to a predetermined range, the data is compressed by a compressor 7 to change it again into PCM codes (non-linear) after it is attenuated by the attenuator 6.
Next, a conventional example is shown in FIG. 2 in which 64K bit/s PCM codes are input to an audio teleconference apparatus using the N-1 addition system in order that the output level is automatically controlled. This figure is also disclosed in Japanese Patent Laid-Open No. 63-123257 showing a case of 4 subscribers. An explanation will now be provided of the operation of FIG. 2. In FIG. 2, the expander 1 and the compressor 7 play the same role as those shown in FIG. 1. Reference numeral 8 denotes an N-1 addition circuit which consists of the shift registers 3 and 4, the adder 2 and the subtracter 5, all of which are shown in FIG. 1.
Expanded audio signals a, b, c, and d of 4 subscribers A, B, C, and D are input to the N-1 addition circuit 8. The sum thereof (a+b+c+d) is output after a predetermined process is performed thereon and then input to an absolute value circuit 9. The output .vertline.a+b+c+d.vertline. of the absolute value circuit 9 is supplied to a peak hold circuit 10 where the peak value P is maintained for a preset amount of time. The peak value P is supplied to the level control input terminal of a level controller 11, which is in the form of a variable gain amplifier or the like. Addition output from the N-1 addition circuit 8, i.e., data a'=b+c+d for subscriber A, data b'=a+c+d for subscriber B, data c'=a+b+d for subscriber C, and data d'=a+b+c for subscriber D, are input to the level controller 11 sequentially. The output level is controlled, for example, by a gain coefficient G determined by the following equation (1), using the peak value P. ##EQU1## where the maximum amplitude level per subscriber is assumed to be 1.
As a result, signals a", b", c", and d" respectively for subscribers A, B, C, and D, whose level is controlled automatically and output, become the following: EQU a"=G.times.a'=G.times.(b+c+d) EQU b"=G.times.b'=G.times.(a+c+d) EQU c"=G.times.c'=G.times.(a+b+d) EQU d"=G.times.d'=G.times.(a+b+c)
In this way, automatically level-controlled signals are sent out to each subscriber.
In a situation, such as a conference call, where it cannot be determined how many participants are speaking at the same time, an output level which is automatically controlled is highly favored by users. This feature avoids a situation where there is an insufficient signal regardless of the number of participants, and is recognized as one of the functions of the conference call system. As a consequence, when a conference call apparatus is realized by using the conventional 64K bit/s PCM coding system, it is constructed as described above.
Now, there is a great demand for producing high-quality voice sound transmissions. Application of a wide-band voice coding system which encodes audio signals in a frequency band wider than the conventional system has been desired. However, when a system which encodes/decodes at each frequency band of such a wide-band voice coding system, is used just as it is, audio signal processing objectives can sometimes not be achieved. When such an audio teleconference apparatus is used, the following problems arise:
Consideration will now be given to an audio signal processing in a case where 64K bit/s sub-band adaptive differential pulse code modulation (hereinafter abbreviated as SB-ADPCM) which complies with CCITT recommendation G.722 is applied to a conference call apparatus as an example of an audio call apparatus in which a sub-band coding system is applied. In the SB-ADPCM coding system, performing an N-1 addition process by using linear PCM signals of the whole band (50 Hz to 7 kHz) obtained as normal outputs of an encoder is first considered. That is, when N-1 addition is realized concerning the linear PCM signals of the whole frequency band (50 Hz to 7 kHz), a decoder and an encoder are constructed as shown in FIGS. 3 and 4. An N-1 addition circuit is connected between points P1 and P3 shown in the figure and constructed as shown in FIG. 5.
However, CCITT recommendation G.722 that prescribes the specification of the coding system recommends that this voice addition be performed by using linear PCM signals of different bands (50 Hz to 4 kHz, and 4 to 7 kHz) for the following reasons:
(1) Conference call apparatuses can be miniaturized because no sub-band and band synthesizing filters are required. PA1 (2) Signal distortion and increased delay can be prevented because no filters are required. PA1 (3) Easier echo control is possible by being performed in each band. PA1 (1) Conference call apparatuses can be miniaturized because no sub-band and band synthesizing filters are required. PA1 (2) Signal distortion and increased delay can be prevented because no filters are required. PA1 (3) Easier echo control is possible by being performed in each band.
Therefore, N-1 addition circuits are respectively connected between points P2 and P4 of each band in a case where an N-1 addition circuit is used for linear PCM signals of different bands to utilize the advantages of (1) to (3) above.
FIG. 6 shows an example of a conference call apparatus which is recommended by the above-mentioned CCITT recommendation G.722. The operation of FIG. 6 will be explained below. When 64K bit/s SB-ADPCM codes are applied to the separation section of an SB-ADPCM decoder 12a placed in the transmission line, they are separated into signals of 16k bit/s in which high-band components of audio signals are encoded and signals of 48k bit/s in which low-band components of audio signals are encoded. They are input to a high-band SB-ADPCM decoder unit and a low-band SB-ADPCM decoder unit. Linear PCM signals are output from decoder units of each band and input to N-1 addition circuits 81 and 82 at each band. Linear PCM signals in which N-1 addition processing has been performed are again distributed band by band, to SB-ADPCM encoder units 13a to 13n for respective subscribers. The signals are encoded by the high-band SB-ADPCM encoder unit to SB-ADPCM codes of 16k bit/s and are encoded by the low-band SB-ADPCM encoder unit to SB-ADPCM codes of 48k bit/s. These codes are synthesized in the multiplexing section and become SB-ADPCM codes of 64k bit/s. In the manner described above, such an apparatus can be miniaturized, has higher quality, and processing can be simplified in comparison with the case in which processing is performed by using linear PCM signals of the whole band by performing an N-1 addition process at each band.
As described above, in a conference call apparatus which uses an SB-ADPCM coding system as an example of an audio signal processing utilizing a sub-band coding system, audio signals of 50 Hz to 7 kHz are separated into low band (50 Hz to 4 kHz) and high band (4 to 7 kHz) signals. N-1 addition processing is performed in each band, resulting in the following advantages: "miniaturization of an apparatus"; "maintenance of high quality"; "elimination of extra delay"; and "easy echo control is possible". However, when automatic level control is performed on the result of the addition process, if a technique in which normal audio signal power of all the bands is determined and a gain coefficient is computed on the basis of the power was used, a synthesis of bands is required. In this case, the above-described advantages cannot be obtained. A voice detection process for detecting a silent condition unique to audio signals is also performed normally by referring to audio signal power in all bands. If bands are synthesized for the purpose of only this voice detecting process, a problem arises in that the amount of hardware of an apparatus increases.