1. Field of the Invention
The present invention relates to a voice coding-and-transmission system for compressing and transmitting a voice signal at a high efficiency, with particularly improved voice quality.
2. Description of the Prior Art
In today's age of multimedia communication, communication networks are used not only for voice, as exemplified by the telephone, but also for transmission of images and computer data. Transmission of large amounts of information such as images and computer data is realized by the digital art. That is, information to be transmitted is digital-coded and the switching system is also improved from circuit switching to packet switching. In the future, communication by ATM (Asynchronous Transfer Mode) will be the mainstream technology used to efficiently transmit such varied information.
To more efficiently perform transmission and correspondingly increase the transmitted information content, data to be transmitted is divided into units such as packets or cells which are transmitted by time division multiplexing. Voice transmission has hitherto used a high-efficiency voice coding art for efficiently coding a voice signal by removing redundant components from the signal by differential coding or a similar art.
High-efficiency voice coding systems for performing coding by using a difference include predictive differential coding system such as the ADPCM (Adaptive Differential Pulse Code Modulation) coding system. The predictive differential coding system predicts present signals based on past signals and quantizes differences between values of the predicted signal and values of the actual signal. Because a difference generally has a value smaller than the original data, the number of bits of a code obtained by quantizing the difference is smaller than the number of bits of a code not depending on a difference. A coding part and a decoding part of this system have respective internal states, which are used as a reference value for a differential processing. The internal state consists of a set of parameters which represent the past voice signal.
In a transmission by an ATM network, multiple transmission lines are used by digital-coding information sources such as voice, image, and computer data, dividing the sources into a unit, called a cell, and transmitting asynchronously in a burst mode to improve an efficiency of utilizing the transmission lines. In communication with the ATM network, the above-mentioned high efficiency voice coding technology can be used in combination therewith. As the majority of traffic is due to voice information, applying high efficiency voice coding technology to voice information will reduce transmission amount and achieve higher efficiency transmission.
Moreover, the voice coding system includes the ITU (International Telecommunication Union) Recommendation G.728 coding system (LD-CELP system: Low-Delay Code-Excited Linear Prediction) whose block diagram is shown in FIG. 28 in addition to the above ADPCM. This coding system is described in Draft CCITT Recommendation G.728 "Coding of Speech at 16 Kbits/s using Code Excited Linear Prediction (LD-CELP)" in detail. This coding system is based on the backward adaption for performing adaptation of a synthesizing filter and excitation gain in accordance with past voice signals. This system also has an aggregate of parameters of the past voice signal as an internal state, which is used as a reference for a differential processing of a synthesis filter coefficient, an adaptive gain coefficient, or the like.
Recently, because of a request for higher efficiency as described above, the silent-period elimination art of excluding a silent part when transmitting a voice signal has been used. It is known that the silent-period elimination art can decrease the total quantity of voice signals to be transmitted to a transmission line with a small voice-quality degradation and realizes higher-efficiency voice transmission according to a statistical multiplication effect. In the case of the silent-period-eliminated voice transmission system, however, operations of a decoding part for receiving and decoding a differential-coded voice signal become indefinite because there is no voice information transmitted during silent periods. That is, when a silent state (this may be referred to as a state with no talk spurt) changes to a voiceful state (this may be referred to as a state with a talk spurt), the internal state of an coding part for generating a voice code does not coincide with that of a decoding part. Therefore, the decoding part is not always able to decode a correct voice signal, even if the part is given a correct high-efficiency code with no transmission line error. This phenomenon frequently appears as uncomfortable abnormal sounds, such as a click or oscillation sound, in a regenerated sound at a reception node.
FIG. 45 is a block diagram of a conventional voice coding-and-transmission system for solving the above problem. This diagram is based on the block diagram shown in Japanese Patent Laid-Open No. Hei 2-181552.
This voice transmitting system forms a set of structures by a transmission node 2 and a reception node 4. Under a state with a talk spurt, that is, at a voicefilled period, the transmission node 2 codes a voice signal using a high-efficiency voice encoder 6 and transmits the signal to a transmission line 10 via a changeover switch 8. Because the changeover switch 8 of the transmission node 2 is switched so as to transmit no data to the transmission line 10 with no talk spurt, that is, at a silent time, a silent-period-eliminated voice code is transmitted from the transmission node 2. A voice detector 12 detects a voice or silence of a voice signal and switches the changeover switch 8.
The reception node 4 decodes a voice code sent from the transmission line 10 to a voice signal by a decoder 14 and outputs the signal. While silent period elimination is performed, the changeover switch 16 is switched to the pseudo-background-noise signal generator 18 side and artificial noises are output from the reception node 4. A voice/silence information extractor 20 detects voice or silence in accordance with a voice code and switches the changeover switch 16. In this system, the transmission node 2 is provided with a memory 22 storing a predetermined internal state of the encoder 6, while the reception node 4 is provided with a memory 24 storing the same content with the memory 22. Moreover, at the transition which a voice signal changes from a silent state to a voiceful state and causes the above problem, the voice detector 12 and the voice/silence information extractor 20 synchronously detect the transition, a reference value for differential processing is set from the memory 22 to the encoder 6 as an internal state in the transmission node 2, and the same reference value for differential processing as that of the encoder 6 is sent from the memory 24 to the decoder 14 as an internal state in the reception node 4. Thus, the timing in which a talk spurt is detected synchronizes between the transmission node 2 and the reception node 4 and, at this point, both internal states are reset to the same state. Therefore, the internal state of the encoder 6 always coincides with that of the decoder 14 in a voice period and thereby, it is possible to avoid abnormal sound at the head of a talk spurt.
In the future, as described above, a silent-period-eliminating transmission network or an ATM network will mainly be constructed using the above arts.
However, transmission networks that do not eliminate silent periods and STM (Synchronous Transfer Mode) networks have already been constructed. These transmission networks were constructed as an infrastructure, in many cases using a great deal of capital. Therefore, it is economically difficult to immediately replace them with silent-period-eliminating transmission networks or ATM networks, or otherwise improve them. Therefore, to construct a large network including a range covered by these conventional transmission networks, it is necessary to allow networks eliminating silent periods and networks not eliminating silent periods, or ATM network and STM networks to coexist respectively.
For the time being, it is possible to realize coexistence of both networks by connecting two types of networks with a relay node.
There are two methods for connecting the silent-period-eliminating network and the silent period network, as shown in FIGS. 47 and 48. These Figures illustrates a transmission from the silent-period-eliminating network to the silent period network. In addition, there are two methods for connecting the ATM network and the STM network as shown in FIGS. 49 and 50. These Figures illustrates a transmission from the ATM network to the STM network.
FIG. 47 is a block diagram of a transmission system consisting of tandem-connecting networks eliminating silent periods and of networks not eliminating silent period connected through a relay node. In FIG. 47, components having corresponding functions as those in FIG. 45 are provided with the same symbol, and their description is omitted. An encoder 32 of a transmission node 30 of this system performs the coding, not eliminating silent periods, and transmits a generated voice code to a transmission line 34 (transmission line B). A relay node 36 receives the voice code from the transmission line B, silent-period-eliminates the voice code, and transmits the silent-period-eliminated voice code to the reception node 4 through a transmission line A. The relay node 36 decodes the voice code from the transmission node 30 as a voice signal by a decoder 38 and, thereafter, codes the voice signal as a silent-period-eliminated voice code and transmits it to the reception node 4. The processing, after decoding by the decoder 38, uses the silent-period-eliminated transmission system using the synchronous resetting described for FIG. 45. Therefore, in the case of this transmission system, because the relay node 36 performs decoding once and then coding again, the transmission lines A and B are from the viewpoint of coding greatly independent from each other and, this system is therefore referred to as a tandem connection.
FIG. 48 is a block diagram of a transmission system constituted by connecting networks eliminating silent periods and networks not eliminating silent periods by digital-one-link through a relay node. In FIG. 48, components having corresponding function as those in FIG. 47 are provided with the same symbol and their description is omitted. A voice code with no silent period eliminated that is transmitted to the transmission line 34 from the transmission node 30 is silent-period-eliminated by a relay node 50 and transmitted to a reception node 54 through a transmission line 52 (transmission line A).
In the relay node 50, a decoder 56 decodes a voice code sent from a transmission line B to restore a voice signal. A voice detector 58 detects voice or silence (presence or absence of a talk spurt) in accordance with the voice signal and controls a changeover switch 60. The changeover switch 60 connects the transmission line B to the transmission line A only when a voice code with no silent period eliminated from the transmission line B has a talk spurt. When the voice code does not have any talk spurts, it is abandoned and no data is output to the transmission line A. Thereby, a silent-period-eliminated voice code is transmitted to the transmission line A. In this connection, a processing delay unit 62 delays the voice code from the transmission line B by the processing time in the decoder 56 and the voice detector 58 and realize the synchronization between the operation of the changeover switch 60 and the voice code.
The reception node 54 decodes a silent-period-eliminated voice code transmitted from the relay node 50 to the reception node 54 through the transmission line A as a voice signal by a decoder 64 corresponding to the encoder 32 of the reception node 30 and outputs the decoded voice code. When no voice code is input from the transmission line A, that is, while silent period elimination is performed, a voice/silence information extractor 66 switches a changeover switch 68 toward a pseudo-background-noise signal generator 70 to output artificial noise from the reception node 54.
Thus, the relay node 60 only performs switching. Therefore, though a voice code transmitted to the reception node 54 is silent-period-eliminated, the voice code itself is transmitted from the transmission node 30. Therefore, in the case of this transmission system, the transmission lines A and B are well combined with each other and this is thus referred to as a digital-one-link.
FIG. 49 is a block diagram of a conventional transmission system constituted by tandem-connecting the ATM network and the STM network through a relay node. An encoder 73 of a transmission node 72 in the system digitizes a voice signal and performs the coding at a high compression rate. A cell composer 74 assorts a sequential voice code coded with the encoder 73 and transmits the code to a transmission line A. The transmission line A is the ATM network. The voice code is transmitted through the transmission line A in cell units in a burst mode.
In the relay node 75, a buffer 76 absorbs a transmission fluctuation of the cell, and then a cell decomposer 77 decomposes the received cell to produce the sequential voice code. An vanished cell detector 78 detects a dead cell due to a disuse or a delay in the ATM network, and controls operations of each portion in the relay node 75. A decoder 79 decodes a voice code extracted from the cell to an original digital sampling voice signal, for example a PCM (Pulse Code Modulation) voice signal. A synchronous incoming unit 80 mates an operation timing between the decoder 73 and the decoder 79. An vanished cell compensator 81 compensates a voice signal for the vanished cell. A memory 82 stores a latest voice signal for compensating the cell. A selector switch 83 is a switch for selecting either the voice signal decoded in the decoder 79 or the voice signal compensated the vanished cell. An encoder 84 is same as the encoder 73. A transmission line B is the STM network. A reception node 85 has a decoder 86 corresponding to the decoder 79.
For voice communication, a real time ability is required. Therefore, a retransmission procedure that a data communication utilizes cannot be applied thereto, if a cell disuse occurs which is a specific cause of degrading of the ATM network. Especially, in an ATM voice communication combining with the high-efficiency coding, cell size is fixed at 53 bytes. With a more efficient coding method, more information can be accommodated in one cell, resulting in greater damage in regenerated voice due to cell disuse. Consequently, to realize a high quality voice transmission with the ATM, a processing for regenerating a natural voice is necessary for interpolating/assuming the information included in the vanished cell.
The system as shown in FIG. 49 utilizes the following method as one countermeasure against cell vanishing. The vanished cell detector 78 monitors cells reaching the relay node 75, detects disappeared cells in the ATM network or those not reaching the relay node 75 within a predetermined period, and sends a control signal based on the detection results to the vanished cell compensator 81 and the selector switch 83. As a method for detecting the vanished cell, the cell composer 74, for example, adds an index representing a sending order to a pay load portion of the cell, and the vanished cell detector 78 monitors whether or not the index is lost.
Once the vanished cell detector 78 notifies the vanished cell compensator 81 of an elimination of the cell, the vanished cell compensator 81 interpolates / extrapolates or mutes the lost voice signal based on a past voice signal stored in the memory 82. In addition, the selector switch 83 chooses between an output of the decoder 79 and an output signal of the vanished cell compensator 81 based on a control signal from the vanished cell detector 78. Chosen signal is reapplied the high efficiency coding with the encoder 84, and is sent to the transmission line B (STM network). Thereby, a voice code with reduced cell vanishing damage is sent from the relay node 75.
In the relay node 75, coding is performed again after the voice code is decoded. Therefore, the transmission system has mutually highly independent transmission lines A and B in view of coding. For this reason the system is called the tandem connection system.
As a voice high efficiency coding algorithm used in the encoders 73, 84 and the decoders 79, 86, ITU-T Recommendation G.726/727 ADPCM (Adaptive Differential Pulse Code Modulation), ITU-T Recommendation G.728 LD-CELP (Low-Delay Code-Excited Linear Prediction), and ITU-T Recommendation G.729 CS-ACELP (Conjugate Structure Algebraic Code Excited Linear Prediction) or the like is well known.
FIG. 50 is a block diagram of a conventional transmission system consisting of digital-one-linking the ATM network and the STM network through a relay node. Components in FIG. 50 having corresponding functions as those in FIG. 49 are provided with the same symbol and their description is omitted. A cell including high efficiency voice code which is sent from the transmission node 72 to the transmission line A (ATM network) is decomposed by the relay node 90, remounted to a synchronous frame, and then transmitted to the reception node 85 through the transmission line B (STM network).
The reception node 85 decodes the voice code, which is transmitted from the relay node 90 through the transmission line B, using the decoder 86 corresponding to the encoder 73 at the transmission node 72, and outputs the decoded voice code. Thus, the relay node 90 only performs a switching. The voice code for transmitting to the reception node 85 is a signal sent from the transmission node 72 itself. Therefore, the transmission system has mutually highly integrated transmission lines A and B in view of encoding. This is a reason that the system is called the digital-one-link system.
Connecting the transmission lines A and B according to a tandem connection or digital-one-link has the following problems. In the case of tandem-connecting a network eliminating silent period and a network not eliminating silent period as shown in FIG. 47, a voice code from the transmission node 30 is once decoded to a voice signal and then transmitted in accordance with the silent period elimination using synchronous resetting. Therefore, the internal state of the encoder 6 of the relay node 36 coincides with that of the reception node 4 and abnormal sound is avoided as described above. However, because the processing of decoding and coding a voice code is performed in a relay node, a voice signal input to a transmission node is coded and decoded twice before it is output from a reception node. Therefore, a problem occurs that quantization errors are accumulated and the quality of a voice signal output from the reception node 4 deteriorates. It is known that the above quality degradation becomes more remarkable as an elimination rate increases, though the quality degradation is almost inconsequential at a high bit rate (16 Kbit/s or more). Because a voice transmission system uses a low bit rate, it is impossible to ignore the above voice quality degradation. This is entirely applicable to the transmission system combined with the high efficiency coding where the ATM network and the STM network is tandem-connected as shown in FIG. 49.
However, in the case of connecting a network eliminating silent period and a network not eliminating silent period according to digital-one-link as shown in FIG. 48, the conditions are completely reversed. In this case, because a voice code corresponding to presence of a talk spurt transmitted to the reception node 54 is the same as a voice code generated in the transmission node 30, voice-signal quality degradation due to accumulation of quantization errors is prevented. However, the internal state of the encoder 32 of the transmission node 30 does not generally coincide with that of the decoder 64 of the reception node 4 at the timing of change from a silent state to a voiceful state. That is, because reference values of the differences in coding/decoding are different, though the voice codes are same, a problem again occurs that abnormal sound is produced. This abnormal sound is not only unpleasant to a user, but it also causes the problem of extreme degradation of speech content clarity because the abnormal sound is generally produced at the head of a talk spurt.
For a transmission system combining high efficiency coding technology in which the ATM network and the STM network are connected in digital-one-link as shown in FIG. 50, the voice code for transmitting to the reception node 85 is the same as the voice code generated at the transmission node 72. Therefore, voice-signal quality degradation due to an accumulation of quantization errors is prevented. However, in the relay node, only switching is performed and extracting voice information from the voice code is not performed. Normally, it is difficult to directly compensate for the vanished voice code by a simple method such as interpolation/extrapolation/assumption without decoding the voice code applied the high efficiency coding.
Accordingly, it is extremely difficult to remove the impact of the cell vanishing in the relay node of the transmission system, although the cell vanishing itself can be detected. As a result, the voice information transmitted to the reception node 85 is discontinuous to induce an abnormal sound at the reception node 85 making a listener uncomfortable. In addition, a missing phoneme remarkably lowers speech comprehension. Nevertheless, to remove the impact due to the cell vanishing at the reception node 85 nevertheless in the digital-one-link connection, the information about the cell vanishing detected in the relay node may be transmitted to, for example, the STM network by providing a signal line separately, and other mechanism for a countermeasure of the cell vanishing may be provided at the reception node 85. However, connecting the ATM network and the STM network is required in case that the STM network and the reception node 85 are existing systems, as described above. Consequently, the solution of removing the impact due to the cell vanishing at the reception node 85 needs an improvement or alternation of the existing system, and lacks reality.
As described above, conventionally, problems have been existed in housing the transmission network in the silent period transmission network or in the ATM network without improving the voice communication system at a side of existing silent-period-vanished transmission network or a side of existing STM network.