In a digital telephone system, a speech signal is coded in some manner before it is channel coded and transmitted to the radio path. In speech coding, digitalized speech is processed frame by frame in periods of about 20 ms by using different methods in such a manner that the result is a group of parameters representing speech for each frame. This information, i.e. the parameter group, is channel coded and transmitted to the transmission path. In channel coding, the information is protected by different error correcting codes.
The speech coding method used in the GSM cellular system is the so-called RPE-LTP (Regular Pulse Excitation LPC with Long Term Prediction). The basic parts of the algorithm are the linear predictive coding filter LPC and residual signal coding as the last stage of the pulse sequence. The operation is completed by tone pitch estimation LP. The coder thus produces short term filter parameters, long term prediction parameters LTP, and RPE parameters. In a decoder, the RPE parameters act as a filter excitation signal, and the received short and long term parameters act as filter parameters. The speech coding algorithm employed by the digital cellular system in the USA belongs to the category of code excited coders CELP (Code Excited Linear Prediction), and the coder is referred to by the term Vector-Sum Excited Linear Predictive Coding (VSELP). The result of the speech coding is a group of parameters, by means of which and also by means of code books having a predetermined structure a speech signal is synthesized in the decoder of the receiver. The speech signal residual is not transmitted at all, as is done in a RPE-LTP coder. The coders of both systems have in common the fact that the coder produces speech frames the duration of which is 20 ms and that a speech frame consists of subframes of 5 ms, each of which contains a speech parameter group.
In addition to actual coding, the following functions are also built in in digital speech processing: a) on the transmitter side, voice activity detection (VAD), by means of which the transmitter can be activated only when there is speech to be transmitted (Discontinuous Transmission, DTX), b) on the transmitter side, background noise evaluation and generation of parameters corresponding to the noise, and on the receiver side, comfort noise generation in the decoder from the parameters, this comfort noise making an interruption in the connection sound more comfortable than absolute silence, and c) acoustic echo cancellation.
As an example of speech processing, speech processing arrangement used in the known GSM mobile telephone system is described with reference to FIG. 1 showing a transmitter side. The input of the speech coder 1 is either a 13-bit PCM signal arriving from the network, obtained by sampling an audio signal at a frequency of 8000 samples per second, or A/D converted 13-bit PCM arriving from the audio part of the mobile station. The duration of the speech frame obtained from the output of the coder is 20 ms and it comprises 260 bits, which are generated by coding 160 PCM coded speech samples.
The speech coder 1 produces the parameters mentioned above for each 20 ms speech frame, and the voice activity detector (VAD) 2 determines on the basis of these parameters whether the frame contains speech or not. According to the information contents of the frame, the VAD detector sets an appropriate flag controlling the operation of a DTX control and operation block 4. Its value may be VAD=1, the frames applied to a channel coder 5 and therefrom further to the radio path as so-called traffic frames being thus speech frames produced by the speech coder. The DTX control and operation block 4 sets a flag SP controlling channel coding for each frame applied to the channel coder 5.
In transmitting speech, background noise is also included in the speech, which background noise would also be interrupted in using discontinuous transmission DTX, which would cause disturbing interruptions at the receiving end. Therefore, SID (Silence Descriptor) frames containing noise parameters are transmitted after a speech burst and at certain intervals also during speech pauses indicated by the VAD 2, the receiver being thus able to generate noise resembling the original noise from these parameters also during pauses. The duration of such a frame and the number of bits in the frame are the same as those of a speech frame. The noise parameters are determined by a noise TX function block 3 on the basis of the parameters obtained from the speech coder 1.
According to FIG. 2 showing the fields of a SID frame, only part of the 260 bits of the SID frame are needed in coding the noise parameters. Background noise spectrum information is coded in field B, and background noise level is coded in field C. As regards the other bits, 95 bits are used for the SID code word, field A, and all the bits have the value zero in the word. The rest of the bits of the SID frame have the value zero, field I. When a pause occurs in the speech, i.e. the VAD flag is zero, it causes the fact that the frames transmitted from the DTX control and operation block 4 to the channel coder and further to the radio path as so-called traffic frames are SID frames containing noise parameters. The value of the SP flag adjusts the channel coding to be suitable for these frames.
FIG. 3 shows a known receiver arrangement used in the GSM mobile telephone system. Channel decoding and detection are performed on the received radio signal in a block 35. The detected traffic frame error-corrected in the channel decoding is provided with a flag BFI (Bad Frame Indicator), which indicates whether the received traffic frame is erroneous or error-free. As regards the traffic frame, it is checked in a SID frame detection block 36 whether a SID frame containing noise information is in question. This is performed by comparing the code word of the received traffic frame bit by bit with the code word stored in the receiver. Depending on how many bits deviate from the correct one, a SID flag is provided with one of three possible values. In addition, traffic frame synchronization information is provided by means of a TAF flag (Time Alignment Flag). The inputs of a DTX control and operation block 34 are thus the traffic frame information bits, erroneous/error-free information BFI concerning the frame, and notification whether the frame is a SID frame containing noise parameters. If the traffic frame is an error-free speech frame, it is applied to the input of a speech decoder 31, which generates the original speech on the basis of the parameters. If the traffic frame is classified as a bad or lost speech or SID frame on the basis of the BFI flag, some replacement procedure of bad speech frames is performed in a block 32 for instance by applying the latest good parameter values as attenuated to the speech decoder. If the traffic frame is an error-free SID frame, it is applied to a noise RX function block 33, which adjusts the speech decoder 31 to produce noise resembling the original noise for as long as speech frames are received again.
A basic characteristic of digital networks is that they do not let signals through like a conventional telephone network. They do not let through DTMF signals properly, let alone a signal of the V.29 modem used in fax machines. In a telephone network, DTMF (Dual Tone Multifrequency) signalling, contrary to dialling pulses, penetrates the entire connection all the way to the B subscriber, and this is why they are especially useful for use in remote controlling apparatuses, for instance in remote interrogating telephone answering machines or in voice coded data transmission. In DTMF signalling, two simultaneous voice frequencies are used to indicate a specific character. All digits 0-9 and the characters * and # are indicated as a combination of two different frequencies selected from frequencies 697 Hz, 770 Hz, 852 Hz, 941 Hz, 1209 Hz, 1336 Hz, and 1477 Hz. 12 allowed combinations have been defined. By using the frequency 1633 Hz, the letter symbols A, B, C and D are also obtained. The number of allowed frequency combinations is thus 16.
For telefax machines, a special adaptation function is specified in GSM networks, the analog signal of a machine being adapted to a digital radio channel by means of this function. Transmission of DTMF signals from a mobile station to the network, i.e. in the uplink direction, has also been specified. According to the specification, DTMF voices are not generated by the mobile station but by a mobile exchange, whereby the voice signals do not have to be applied via a speech coder. In depressing numeric pushbuttons of the mobile station during the speech connection, the mobile station transmits a message, and the mobile exchange generates the message after having obtained a corresponding DTMF signal.
The problem with present-day networks is thus the transmission of DTMF signals in the downlink direction. This is specified in no manner in present-day mobile networks. It is true that DTMF signals travelling from the network to a mobile station reach the mobile station, but in a distorted form, since they have to travel, on the network side, via a speech coder and thereafter via a speech decoder in the mobile station. Due to the distortion, they do not fulfill the conditions set by DTMF detectors of a fixed network to detect DTMF signals. Transmission of signals in the uplink direction also presents problems despite the specification mentioned above: when the user uses the DTMF facility of the mobile station, the station transmits both the starting message and ending message of a DTMF signal, the mobile exchange acknowledging both messages with acknowledgement messages. Accordingly, the transmission of a number comprising for instance ten characters requires a total of 40 messages. This loads the network.
The problem is emphasized especially in telephone systems in which a fixed connection between an exchange and subscriber stations in a fixed telephone network is replaced with a radio connection. The solution is referred to as a telephone system implementing a wireless subscriber connection, i.e. as the WLL system (Wireless Local Loop System). In the WLL system, a wireless fixed terminal equipment comprises a radio unit provided with an antenna and a telephone adapter, which connects a standard subscriber station to the terminal equipment. The subscriber station may be a conventional telephone set to which is connected a telephone answering machine. The user uses the subscriber station in the same manner as in a conventional fixed network, even if the subscriber line connection consists of a radio connection between the terminal equipment and a base station. The base station is connected to a special subscriber network element, which is connected to a standard telephone exchange. The WLL system can be constructed by applying for instance the components of the digital GSM system. The signalling of the WLL system is thus in accordance with the system concerned. In the WLL system, transmission of DTMF signals from the network over the radio path to a subscriber station would be extremely desirable.
One proposed solution for solving the presented problems is disclosed in European Patent Application 534 852. According to it, a DTMF detector and a DTMF coder are provided at the transmitting end, in the transcoder of a base station, in addition to a speech coder. The minimum period set for the detector to detect a DTMF signal is short, only 5 ms. When the detector detects a DTMF signal arriving from the network, it gives a control signal to the DTMF coder associated with the speech coder and to the transmitter. The DTMF coder thus establishes a frame, which resembles a SID frame and contains information on the detected DTMF. The transmitter, as controlled by a controller, selects this DTMF frame resembling a SID frame instead of a speech frame.
The fields of such a DTMF frame are shown in FIG. 4. The first three fields A, B and C correspond to the fields of the SID frame of FIG. 2, the field A comprising 95 bits thus containing the SID frame identifier, field B containing information on the background noise quality, and field C on the background noise level. As distinct from the frame of FIG. 2, a DTMF frame according to the European Patent Application contains additional fields D, E and F. Field D contains a DTMF frame identifier, which comprises 8 bits each being in 1-state. Field E contains a DTMF frequency pair code, which comprises 4 bits, whereby there may be 16 frequency pairs. The four-bit field F indicates the DTMF voice duration as multiples of 5 ms.
At the receiving end, the DTMF frame is identified by means of the SID frame identifier (field A) and the DTMF frame identifier (field D). The `code` parameter indicated in field E defines the DTMF frequency pair in question, and the `duration` parameter indicated in field F indicates which periods of the 20 ms frame divided into periods (subframes) of 5 ms contain DTMF signal. At the reception, the DTMF signal according to the code of field E is generated for those 5 ms periods which contain DTMF signal according to the duration of the DTMF signal. For the other periods of the frame, background noise defined by the SID parameters is generated.
A disadvantage of this known solution is that errors caused by the radio path are not taken into account. If erroneous DTMF frames are occasionally received at the receiving end, the regeneration of a DTMF signal, performed at the receiver, may become problematic, since successive frames are in no manner interlinked. There is no way of knowing for certain how long the same DTMF signal has been received or should have been received and if a new DTMF signal has already begun. The code of a DTMF signal may be the same in successive DTMF frames even if two separate DTMF signals were concerned. According to the recommendation CEPT T/CS 46-02, the conditions of reliable DTMF detection are that a DTMF signal endures more than 40 ms and that it is preceded by a state, which endures more than 40 ms and contains no voice frequency signal, or which is a detection state of a different voice frequency signal. Since the detector at the transcoder of the transmitting end uses at least 5 ms for DTMF signal detection, there is no time to detect the signal if it begins during the last period of 5 ms of the frame. Consequently, DTMF signal may occur at the end of the frame for a period of less than 5 ms, this signal travelling through the speech coder-decoder chain and being distorted. Immediately after occurs pure voice frequency signal generated on the basis of the received DTMF frames. If a pause of more than 40 ms is not maintained between the distorted DTMF signal and the pure DTMF signal, the detection of the DTMF signal at the subscriber end may fail altogether. Also, in the prior art solution, it is not checked at any stage if a DTMF signal arriving from the network to the transmitting end has endured more than 40 ms. In the transcoder of the transmitting end, it is thus possible to detect for instance a DTMF signal enduring less than 20 ms and transmit it forward over the radio path all the way to a subscriber station. In the subscriber station, the voice is not recognized as a DTMF signal, whereby it is disturbingly audible in speech.