The present invention consists in a method and a system for transmitting data on a speech channel, in particular in the field of mobile telephony. However, it could also be used in the field of networks of fixed telephones, known as switched networks. The invention solves problems associated with the particular features associated with data transmission which takes place on a speech channel with the data in the speech channel being transcoded in a manner corresponding to encoding speech in a network.
Speech transmission channels and data transmission channels are known in themselves in the field of mobile telephony. Data transmission channels require encoding different from speech encoding. They use network plant that is specific to data mode. In practice, a special contract must be entered into with a mobile telephony operator for this purpose. This provides access to point-to-point transmission of data in circuit-switched mode at a bit rate of 9 600 bit/s.
In the field of GSM cellular telephony, there are data transmission means using signaling channels of the cellular system. A distinction is drawn between SMS (Short Message Service) channels which can transmit at up to 300 bit/s and USSD (Unstructured Supplementary Service Data) channels which can handle bit rates in the order of 800 bit/s. The bit rate is low in both cases. In the case of USSD channels, the information is transmitted only from a user to the network. In the case of SMS channels, the information can be exchanged user to user or from the network to a user and is billed per packet exchanged, the cost at present being high.
The aim of the invention is to enable data of any kind to be transmitted over a network, in particular a mobile telephony network, at a high bit rate and without having to enter into an additional contract. In particular, the invention makes use of Internet access services. It also enables a manufacturer to update and maintain terminals.
What distinguishes speech encoding from data encoding, in particular for transmission in mobile telephony, is essentially the nature of the digitized data representing the speech. Speech digitized in a simple way produces a vast amount of digital data. In the context of mobile telephony, particular types of speech encoding have been developed to prevent the transmission channel frequency congestion that would result from excessively high data bit rates.
These particular types of encoding, known as source encoding, consist in principle in seeking characteristics representative of how speech is produced. These characteristics include three magnitudes, namely:
a fundamental frequency (pitch) corresponding to the vibration of the vocal chords,
filtering corresponding to modification of the fundamental vibration and resulting from the propagation of the vibration in the speech system, i.e. the larynx, pharynx and mouth, and
an excitation (or error) corresponding to a residue of the preceding modeling of the speech uttered.
A GSM source encoder establishes best values of these three types of magnitude from a PCM (Pulse Code Modulation) signal. A PCM signal is produced by sampling a speech signal at a frequency of 8 000 Hz and quantizing it on 13 bits, for example. The bit rate of the PCM signal is therefore 104 kbit/s in this example. The source encoder performs an operation known as analyzing or encoding the PCM signal.
The remainder of the description refers to a GSM network and transmission of speech when the source encoding is of the xe2x80x9cFull Ratexe2x80x9d type (ETSI recommendation SMG 6.10). The principles of the invention are nevertheless applicable to other forms of source encoding, or speech formats, in the GSM network (Half-Rate or Adaptive Multi-Rate).
They are also applicable to other mobile telephone networks (DCS-1800, PCS, etc.).
FIG. 1a shows the source encoding of the corresponding PCM signal for a 20 ms frame of a speech signal. This source encoding includes generating 36 bits of a pitch signal (corresponding to a long-term prediction), generating 36 bits of a filter signal and generating 188 bits of an excitation signal, for example. The 36 bits of the filter signal correspond to eight coefficients of a short-term linear prediction filter. The 188 bits of the excitation signal correspond to 60 excitation parameters.
At the receiving end, a synthesizing encoder receives corresponding streams of 260 bits per 20 ms period (and thus at a bit rate of 13 kbit/s). This synthesizing encoder includes programmable filters in cascade. A long-term first filter receives the excitation signals and filters them with filter values corresponding to the 36 bits of the pitch signal. A short-term second filter connected downstream of the first filter filters the resulting signal with filter values corresponding to the 36 bits of the short-term filter signal. Like the original PCM signal, the reconstructed signal has a bit rate of 104 kbit/s.
All of the processing shown in FIG. 1a is effected repetitively. The period of this repetition is 20 ms in the currently-applicable standard. A stream of 260 bits which represent the parameters of the three magnitudes must be produced in each period of this repetition. In the aforementioned standard there are 260 bits to be transmitted every 20 ms, which corresponds to a bit rate of 13 kbit/s.
The source encoding includes the conversion of an analog amplitude (the level of the pressure wave representative of the sound) into three types of magnitude. The first magnitude represents the fundamental frequency or pitch and this parameter is routinely known as the Long Term Prediction (LTP). This first LTP magnitude is encoded in 5 ms sub-frames (four sub-frames per 20 ms) and 9 bits are encoded in each subframe, representing a total of 36 bits per 20 ms frame. The LTP pitch magnitude and the 9 bits encoded each time corresponding to two components: a delay or lag (encoded on 7 bits) defining a pitch period or delay size of the long-term prediction filter and an amplitude (encoded on 2 bits) defining an optimum coefficient of the long-term prediction filter.
The eight coefficients of the short-term filter are expressed in a transformed system called the Log Area Ratio (LAR) or coefficients: LAR1 to LAR8. These coefficients are quantized with variable dynamic ranges depending on their size or their associated energy. Thus, two first coefficients LAR1 and LAR2 of the short-term filter are quantized on 6 bits. The next two coefficients LAR3 and LAR4 are assigned a dynamic range of 5 bits. The next two LAR5 and LAR6 are assigned a dynamic range of 4 bits and the last two LAR7 and LAR8 are assigned a dynamic range of 3 bits. In practice, 36 bits are allocated in this way to the representation of the short-term filter.
In the 260 bits transmitted, the remaining 188 bits (260xe2x88x9236xe2x88x9236) are used to encode the 60 excitation or RPE (Regular Pulse Excitation) parameters. The RPE is calculated, like the pitch signal, in four sub-frames each corresponding to 40 samples (5 ms). The four RPEs calculated in this way are each described in the form of regularly spaced grids with a pitch of three at the initial sampling frequency of 8 kHz. Each grid is described by 15 RPE parameters, namely:
an RPE grid position, encoded on 2 bits,
an amplitude on the sub-frame, encoded on 6 bits, and
thirteen coefficients describing a relative amplitude of each pulse of the grid (RPE pulses), each encoded on 3 bits.
When a digital message of this kind is encoded in this way, it is channel encoded, when transmitted, so that it can be transmitted on a radio channel subject to a high transmission error rate. The form of channel encoding applied in GSM telephony comprises the following steps shown in FIG. 1b. The first step is concerned with systematic classification of the bits into three categories according to their sensitivity to errors as established by the standard:
class 1a: 50 bits, highly sensitive,
class 1b: 132 bits, sensitive, and
class 2: 78 bits, insensitive.
This classification is defined in GSM recommendation 5.03. Class 1a essentially concerns the more significant bits of the various parameters. Classes 1b and 2 contain the less significant bits.
A second step includes protecting the bits according to their sensitivity class. This protection is obtained:
for class 1a by adding a cyclic error detection code (CRC) on 3 bits (53 bits at the output),
for protected classes 1a and 1b, in combination, four drag bits are added (189 bits at the output); a convolutional error correcting code of ratio xc2xd is applied to this set of 189 bits, which produces 378 bits at the output, and
for the preceding result and the bits of class 2, i.e. 456 bits in total, no additional protection.
The total of 456 bits obtained over the 20 millisecond period is then divided between four successive frames for transmission, for example. Each transmission frame includes, in particular in the TDMA version of the GSM, 577-microsecond time slots during each of which 156 bits are transmitted, as shown in FIG. 1c. The messages on 156 bits include 10 meaningless bits at the start and end of the message, used in particular for synchronization. These 10 bits are essentially used for setting the transmitted power of the mobile telephone transmitter to prevent this power being applied too suddenly and causing percussive distortion.
Then two information streams of 57 bits are sent: they are representative of the message to be transmitted. They are placed on either side of a message on 22 bits relating to the identity of the mobile telephone, the call, or the originator of the message. In the final analysis, twice 57 payload bits representative of the message are therefore sent in a given time slot of a frame. Because the message is sent in four successive frames, 2xc3x9757xc3x974=456 bits are sent in four frames, with a duration of 18.46 milliseconds. This means that the bit rate of the channel is slightly higher than the bit rate of the source encoder, and thus the digital message can be transmitted in its entirety.
CDMA type encoding or another cellular (or otherwise) system radio interface can be adopted instead of TDMA encoding, provided that the bit rate of the system is higher than the bit rate of the speech encoder.
At the receiving end, the received digital message undergoes channel decoding corresponding to the above channel encoding and supplies the three types of digital data mentioned above at a rate of 260 bits every 20 milliseconds, plus an indication of any residual errors in the bits of class 1a. 
FIG. 2 shows that the encoded and protected speech information is then transmitted by a mobile telephone MS1 to a Public Land Mobile Network (PLMN). The PLMN performs the channel decoding of the speech data in Base Transceiver Stations (BTS) to produce data with the GSM speech format, i.e. as if it had come from a source encoder. The PLMN then transcodes the data with the GSM speech format into data with a format commonly used in circuit-switched networks. This latter format, known as A-law PCM, results in a bit rate of 64 kbit/s corresponding to 8 000 samples per second encoded on 8 bits. This transcoding is performed by a Transcoder Rate Adaptation Unit (TRAU) of the PLMN, generally located at the Mobile Services Switching Centers (MSC) of the networks.
The transcoding performed by the TRAU consists in synthesizing the speech by means of a decoder using the inverse process to that used by the source encoder described above. This produces a speech signal in the Pulse Code Modulation (PCM) form of representation and comprising 8 000 samples per second encoded on 13 bits, like the original signal, to which A-law logarithmic transformation is applied to encode each sample on 8 bits (64 kbit/s). This new form of the speech signal contains all the physical information of the previous signal with the GSM format, ignoring transmission and transcoding errors. This signal with the A-law PCM format is transmitted to a public switched telephone network (PSTN), for example a cable network.
In the PSTN, the 64 kbit/s signal is either converted into an analog signal or is transmitted over digital circuits. It is then routed to an addressee over the switched network or to another TRAU if the addressee is also on a cellular mobile telephone network. This other TRAU, referred to hereinafter as the inverse TRAU because it performs the inverse transformation to that performed by the first TRAU, returns it to the GSM format and bit rate.
FIG. 2 shows that the initial speech signal with the GSM format at 13 kbit/s can thus undergo a plurality of TRAU or inverse transcoding steps. Each transcoding and transmission step can degrade the content of the signal. In particular, if it is required to re-establish the GSM format at 13 kbit/s at the receive end system, the audible speech will be very similar to that at the transmit end system, but the values of the parameters of the model (LTP, LAR, RPE) may be significantly different.
If a voice channel is to be used to transmit data, it is therefore not possible merely to replace the bits descriptive of these parameters in the transmitter with the data to be transmitted if they are to be recoverable at the receiver. This procedure leads to errors at the receiving end of the system. Also transcoders are well suited to manipulating parameters typical of human speech. Random bit configurations are obtained if the bits descriptive of these parameters are replaced with raw data. The analyzer and synthesizer circuits may then not know how to reproduce these configurations. This occurs, for example, if the bit configurations represent sudden variations in the energy of the blocks or the value of the pitch.
The aim of the invention is to remedy the above drawbacks by placing the data bits to be transmitted in the 260 bits of the GSM speech format. The principle of the invention is to satisfy certain constraints. These constraints are:
Constraint 1: it is necessary to make the best possible use of the bits that are most protected by channel encoding.
Constraint 2: it is necessary to avoid introducing configurations or variations that are incompatible with proper functioning of existing TRAU transcoding plant.
Constraint 3: it is necessary to avoid placing information in bits that are not secure with regard to transcoding and transmission.
The invention uses the existing speech transmitter circuits of the PLMN and the PSTN. Mobile telephony and switched network operators must not be required to change their plants. The invention nevertheless uses the speech channels to transmit data, in particular with a bit rate higher than the signaling channels are capable of and at a lower cost than the data channels.
In practice, with the words of 260 bits referred to, there can be 2260 different binary configurations. Using the invention, some of these configurations will not be recoverable at the remote plant. In the invention, a large number of these are eliminated. The bit configurations eliminated relate to:
constraint 1: in this case the bits of class 2 are not used;
constraint 2: in this case high variations in the amplitude of the pitch or energy values of the frames are excluded;
constraint 3: the least significant bits of the source encoder parameters and the parameters depending on the first analysis steps (RPE) are not used. The values analyzed for the RPE depend greatly on the long-term and short-term filtering steps. Accordingly, the new RPE parameters can be very different from the initial values if these two filters are modified during transmission or transcoding. It is sometimes possible to retain certain magnitudes and an overall trend of the RPE grids throughout transmission, as shown below. These include, for example, the maximum amplitude value of the block and a set of grid configurations resulting from sub-quantizing of the universe of possible xe2x80x9cgrid-positionxe2x80x9dxe2x80x94xe2x80x9cRPE pulsexe2x80x9d values.
The invention looks for configurations of bits among the 260 bits of the 20 ms frame which are interpreted as well as possible in the transcoding and transmission steps. These configurations are authorized configurations. They are referred to as robust configurations. Robust configurations are not fixed as such. A dynamic aspect is introduced by constraint 2. The other configurations are prohibited configurations.
The invention uses a number of robust configurations much lower than 2260, for example 264 configurations. This then constitutes a transcoding system similar to a MODEM function which transcodes each word of the data to be transmitted into robust configurations. The effect of this transcoding is to transcode 64-bit data words to be transmitted into 260-bit words with the GSM speech format. In these 260-bit words, 64 bits identified precisely by their position can have a value which is significant of the message to be transmitted. The remaining 196 bits have fixed values independent of the data to be transmitted. These fixed values can be 1 or 0 chosen to give the greatest robustness.
The invention replaces the source encoder with a specific transcoder. The robust configurations of the transmitted bits are then interpreted correctly by the inverse TRAU circuits and conveyed normally on the PSTN. At the receiving end, in the case of a cable network terminal unit, robust messages are synthesized again with a TRAU from the data transmitted as 64 kbit/s and conveyed at 13 kbit/s with the GSM speech format. A transcoder that performs the inverse process to that performed by the specific transcoder is then used to reconstitute the transmitted data. In the case of a GSM terminal unit, only the inverse transcoding step is needed because the UART transformation will have been done already by a PLMN.
The invention therefore requires, in addition to the standard plant of the networks passed through, only a specific encoder at the transmitter which is substituted for the source encoder and a specific transcoder at the receiver which is the inverse of the previous one.
The invention therefore consists in a method of transmitting data on a speech channel, in particular a mobile telephone channel, wherein:
configurations of binary streams that correspond to prohibited non-robust configurations are identified in this speech encoding scheme,
the data to be transmitted is transcoded into data with the speech transmission format, retaining only bit configurations other than the prohibited configurations,
the transcoded data is transmitted over a network, in particular a mobile telephone network, and
the transcoded data is correspondingly decoded at the receiver.
The invention also consists in a device for transmitting data on a speech channel, in particular a mobile telephone channel, the device including:
a transcoder for transcoding a block of bits to be transmitted available in a block format into a message of bits formatted with a speech transmission format, the block format including a smaller number of bits than the speech transmission format, and
a switch for substituting, in the transmission, the message of bits to be transmitted formatted with a speech transmission format for a message delivered by a speech source encoder.