In discontinuous speech coding according to the VOX-principle (VOX=Voice Operated Transmission) a unit which detects voice activity, a so-called VAD-unit (VAD=Voice Activity Detector) decides for each sound sequence received whether the received sound information represents human speech or not. The VAD-unit can have two different conditions. A first condition means that a current sound is classified as human speech and a second condition means that a certain sound is classified as non-speech.
If the VAD-unit detects that a given sound sequence represents speech then the VAD-unit generates a first condition signal and a speech coder unit is controlled to deliver a so-called speech frame which contains coded speech information. If on the other hand a given sound sequence is determined by the VAD-unit to be sound of a type which is not human speech then the VAD-unit generates a second condition signal and an SID-frame generator is controlled to deliver every N'th frame a so-called SID-frame (SID=Silence Descriptor). During the intermediate N-1 possible opportunities to send data neither the SID-frame generator nor the speech frame generator transmit any information and the transmitter is silent.
An SID-frame includes information on estimated background noise levels and estimated noise spectrums on the transmitter side.
The above method is used for example in mobile radio communication systems in order to save battery energy in the mobile terminals in order to administrate the radio bandwidth, i.e. minimize the transmission of radio energy when a given radio channel does not need to be used for the transmission of speech information. This method is, however, also applicable in other types of telecommunication systems when it is required to minimize the bandwidth used per speech connection.
It is known in the prior art in discontinuous speech coding to let a speech coder unit send an SID-frame every N'th frame when the VAD-unit detects non-speech. In known applications, such as for example in the GSM-system (GSM=Global System for Mobile Communication), approximately two SID-frames are sent per second.
The parameters included in the SID-frames: estimated background noise level and estimated noise spectrum are calculated as an average value of a current estimate and the estimates from a number of previous frames. The receiver interpolates furthermore between the received parameter values for N-1 intermediate data positions in order on the receiver side to obtain an evenly varying representation of the background noise on the transmitter side.
When the VAD-unit changes from producing the first to producing the second condition signal, i.e. from detecting speech to detecting non-speech, then normally a time interval of a given length T.sub.1, the so-called hangover, is applied in which the speech coder unit continues to deliver speech frames as if the received sound information had been human speech. If the VAD-unit after the hangover time T.sub.1 continues to register non-speech then an SID-frame is generated.
The reason for this method is amongst others that short pauses in speech inside sentences shall not be translated as non-speech, but that the speech frame generator in this situation shall continue to be activated. The application of hangover, however, does not solve the problem which noise transients with high energy contents cause. These noise transients risk namely to be interpreted by the VAD-unit as speech and if this occurs then the speech frame generator's parameter will be adapted to the spectral characteristics of the noise transients which will lead to a large degradation of the condition of the speech frame generator. A precondition for the application of hangover is therefore that the previous speech sequences should be longer than a second predetermined time T.sub.2.
When the VAD-unit changes from producing the second to producing the first condition signal, i.e. from non-speech to speech then normally no corresponding measure is taken but the speech frame generator is started immediately.
In the European patent application EP-A1-0 544 101 an example is given of how on the receiver side a background noise level can be reconstituted out of received frames which describe the background noise between transmitted speech sequences. The patent document WO-A1-95/15550 describes a method for calculating the average value of the background noise level for a number of historic frames, the current frame and up to two expected future frames out of the so-called noise-only frames. The calculated background noise level is subsequently eliminated out of the received speech signal with the purpose of forming a resulting signal of which the noise content is minimal.
When the VAD-unit changes from producing the first to producing the second condition signal, i.e. from speech to non-speech, there is a risk present that the last received SID-frame or frames parameters have been influenced by the just finished speech sequence. These parameters are namely determined as a average value of the current frame and a number of previous frames. In GSM-standard this problem is solved through a new SID-frame not being sent if the previous speech sequence was so short that the hangover had not been activated, that is to say if the speech sequence had been shorter than the time T.sub.2. Instead in this situation a copy of the SID-frame which was sent immediately before said speech sequence is transmitted. See ETSI, TCH-HS, GSM Recommendation 6.41, "Discontinuous Transmission DTX for Half Rate Speech Traffic Channels".
According to the GSM-standard, on the transmitter side the last sent SID-frame is saved when the VAD-unit changes from the second to the first condition, i.e. from non-speech to speech, in order to possibly use the SID-frame as stated above. The parameters in this SID-frame can, however, also be misleading as they can have been influenced by sound from the speech sequence which is beginning. The risk for this is especially large if the condition signal of the VAD-unit changes immediately after an SID-frame has been delivered. If the background noise level is high, then the VAD-unit probably changes the condition signal more frequently than that which is motivated by the speech information on the transmitter side, because certain speech sounds during these conditions can sometimes be misinterpreted as non-speech.