The present invention relates to speech packet communication systems for communicating encoded speech signals in the form of a packet, and more particularly to such systems in which a plurality of terminals are interconnected via a communication network in which each terminal prepares packets from encoded speech signals and communicates with each other using the packets.
In a conventional speech packet communication system, a transmitter (speech terminal) encodes an input speech signal detected at predetermined periods, uses the encoded speech signal and a code (for example, high, low, high . . . ) indicative of a priority determined at a predetermined rate to prepare the packets and transmits speech packets sequentially. A transmit node which has received the packets transmits a received packet of higher priority to a receiver if the communication state of the node is in a high traffic state.
Another conventional speech communication system is proposed in which the transmitter divides a coded speech signal detected during one sample time into the most significant bits indicative of an essential speech characteristic signal and the least significant bits indicative of an additive speech characteristic signal, prepares the most significant packet from a train of the most significant bits having a high priority, and the least significant packet from a train of the least significant bits having a low priority, contained in a predetermined time interval, and transmits the most and least significant packets. A transit node which has received these packets transmits to the receiver the packets sequentially, starting with a packet of higher priority if the communication state at the node indicates high traffic. These speech communication systems are described, for example, in Proc. Globecom '87 (1987) pp. 45.3.1-45.3.5.
Since, generally, the correlation between adjacent samples is high in a speech signal, it is recommended to linearly predict an input speech signal, to subtract the predicted value from the input signal, and to quantize differentials from which correlation between samples is greatly reduced rather than to directly quantize the input speech signal, because the former provides substantially the same speech quality using a smaller number of quantization bits than the latter. An encoding system employing this principle is referred to as Differential Pulse Code Modulation (hereinafter referred to as DPCM briefly).
Since the correlation between adjacent samples of a speech signal changes with phoneme, it is necessary to adapt a prediction coefficient, used in linear prediction, to the input speech. Generally, a predicted error or the magnitude of the differential is observed and the predicted coefficient is adapted such that the error is reduced. In the decoding end, the differential code or signal output from the transmitter is inversely quantized with a predetermined accuracy and a predicted value is calculated using the result and the output speech signal is obtained using the predicted value. Therefore, both the transmitter and the receiver and able to serve an encoding or decoding function on the basis of the same reference. In that case, even if the predicted coefficient itself is not actually transmitted, both the transmitter and the receiver can determine the same prediction coefficient to thereby use the transmitted capacity effectively. This system is referred to as a Backward Adaptive Differential Pulse Code Modulation (hereinafter referred to as ADPCM-b).
In a transit node in a communication network (for example, Asynchronous Transfer Mode (hereinafter referred to ATM)), part of a packet can be discarded in accordance with the traffic state. Discard of the packet is determined in accordance with a priority assigned to the packet. Such an operation of a transmit node is hereinafter referred to as traffic controlling. Thus, a first packet is prepared from the most significant bits of a train of bits of a speech signal inputted and encoded for a predetermined interval of time and greatly influencing on the speech quality, the packet is assigned high priority. A second packet is prepared from the least significant bits less influencing the speech quality whereby, the packet is assigned less priority. The resulting first and second packets are transmitted. Assigning such predetermined priorities to the respective packets serves to provide at least the most significant bits with high probability in the receiver even if a high traffic state occurs in the transmit node. If a prediction coefficient used in the ADPCM-b is calculated using only the most significant bits in both the transmitter and receiver, both the transmitter and the receiver will obtain the same prediction coefficient. Such encoding system is referred to as Embedded Adaptive Differential Pulse Code Modulation (hereinafter referred to as Embedded ADPCM briefly). The Embedded ADPCM includes a system which calculates the power value and prediction gain of a speech signal received at predetermined periods to change the number of bits of the input speech signal. The ADPCM is described, for example, in AT & T Technical Journal, Vol. 65, No. 5 (1986 September and October p. 12-22). The Embedded ADPCM is described, for example, in IEEE Transactions On Communications, Vol. COM-28, No. 7 (July, 1980) pp. 1040-1046, "Embedded DPCCM for Variable Bit Rate Transmission" or "Variable Rate Embedded ADPCM with Perceptionally Appropriate Criteria", INSTITUTE OF ELECTRONIC AND COMMUNICATION ENGINEERS AUTUMN NATIONAL MEETING A-4. 1988.
In the former conventional technique (Proc. Globecom), the transmitter beforehand determines priorities so as to be, for example, alternately high, low, high, low . . . , prepares the packets sequentially from speech signals encoded for a predetermined time interval and the priorities and transmits the packets. Therefore, even if a transit node is in high traffic, the packets having at least alternate high priorities are transmitted, so that a speech signal reproduced by the receiver has a low probability that the speech signal will be absent continuously. Similarly, in the other speech communication systems, at least the most significant bits have a high probability that they will arrive at the receiver, so that a decoded speech signal has a low probability that it will be interrupted.
However, packets having excessively high priorities is used in order to cope with speech information present for a front end period of a talkspurt or a hangover in which the speech quality reproduced by the receiver is relatively less influenced. With speech information whose reproduced speech quality is deteriorated due to dropout of even a small packet, a slight deterioration in the decoded speech quality is inevitable except in the lowest traffic state of all the transit nodes through which the speech information passes to the receiver because a packet having low priority is discarded at a transit node with the same probability as the speech information present for the hangover interval.
In the latter conventional technique (for example, Embedded ADPMC), the prediction coefficient is calculated using only the most significant bits of the encoded speech signal or the accuracy of the differential code used for calculation of the prediction coefficient is lowered in accordance with the prediction gain.
Therefore, the prediction accuracy is deteriorated and the differential signal increases depending on the kind of the input speech. As a result, the error caused by quantizing the differential signal would increase and the signal to noise ratio of the reproduced speech signal would be lowered.
Especially, in the Embedded ADPCM which dynamically changes the number of coded bits of an input speech signal in accordance with the prediction gain of the input speech signal, the following problems would arise if, for example, an ATM (Asynchronous Transfer Mode) which would be a powerful candidate for the next generation communication networks is used as a communication network.
Generally, in the ATM network, a packet-like transmission unit is used which is called a cell of a fixed length. If a speech code in which the number of code bits in one period changes is stored, the number of packets transmitted during one period also changes dynamically. As a quantity of this change increases, a time interval will occur in which the number of packets produced temporarily increases and the transmission time for the packets increases the time interval. Thus it is necessary to prepare for an increased quantity of buffer memory to cause many produced packets to wait and see temporarily.
If the number of code bits in one period is not an integer times the length of a packet, one of the following two processes must be employed:
(1) The next frame period is awaited and the packet is filled with code bits for the next period and then transmitted; and
(2) Bits indicative of an empty space (for example, of "0") are inserted into unused portions of the packet and the resulting packet is immediately transmitted.
According to the process (1), the time required for transmitting the code bits for one period would increase and the transmission times would vary. In the process (2), the efficiency of use of the transmission capacity would decrease.