This application also claims the national phase of international application PCT/FI97/00194 filed Mar. 27, 1997 which designated the U.S.
The invention relates to speech transmission in a packet network and especially to transmission between a transcoder and a base station of a digital mobile communication network.
The invention will be explained in connection with speech processing and speech frames but the same technique can be applied to transmission of a music and video signal. It is common to these signals that signal samples have to be conducted isochronously to a decoder, that is, essentially at intervals equal to the intervals at which the samples are formed in the encoder.
In a digital telephone system a speech signal is encoded in some manner before it is channel coded and sent to the radio path. For example, in the case of the GSM system, digitalized speech is processed frame by frame at intervals of about 20 ms by using different methods so that it results in a parameter group representing speech for each frame. This information, that is, the parameter group is channel coded and sent to the transmission path. The used speech coding algorithms are RPE-LTP (Regular Pulse Excitation LPC with Long Term Prediction) and various code excited algorithms CELP (Code Excited Linear Prediction) of which VSELP (Vector-Sum Excited Linear Prediction) should be mentioned.
In addition to actual coding, the following functions are also built in for speech processing: a) on the transmitter side Voice Activity Detection VAD with which the transmitter can be instructed to be switched on only when there is speech to be sent (Discontinuous Transmission, DTX), b) on the transmitter side the evaluation of background noise and the generation of respective noise parameters and on the reception side the generation of comfort noise in a decoder from the parameters, and c) acoustic echo suppression. Noise during a break makes the connection sound more pleasant than absolute silence.
In a known GSM mobile telephone system the input of a speech encoder is either a PCM signal of 13 bits from the network or an A/D converted PCM of 13 bits from the audio part of the mobile station. The speech frame obtained from the output of the encoder is 20 ms in duration and comprises 260 audio bits which are formed by encoding 160 PCM-encoded speech samples. Voice Activity Detection (VAD) defines from the parameters in the speech frame whether or not the frame contains speech. If speech is detected, the frames transmitted to the radio path as so-called traffic frames are speech frames. After a speech burst, and at specified intervals also during speech pauses indicated by the VAD, the traffic frames are SID frames (Silence Descriptor) containing noise parameters, in which case the receiver is able to generate from these parameters noise similar to the original noise also during pauses.
A traffic frame thus contains a speech block of 260 bits representing 20 ms of encoded speech/data or noise. Furthermore, the frame has 56 bits available for frame synchronization, speech and data indication, timing and other information, the total length of the traffic frame being 316 bits. Uplink and downlink traffic frames differ slightly from one another in these 56 bits.
Referring to FIG. 1, which shows a simplified view of the present GSM network from the point of view of transmission. Network Subsystem comprises a mobile service switching centre, the mobile communication network being connected via the system interface of the mobile services switching centre to other networks, such as Public Switched Telephone Network PSTN. Via A interface the network subsystem is connected to the base station subsystem BSS comprising base station controllers BSC and base stations BTS connected thereto. The interface between the base station controller and the base stations connected thereto is an Abis interface. The base stations are in radio communication with mobile stations via the radio interface. Traffic frame forming unit TRAU explained above is in the figure placed in association with the base station but it may also be situated in association with the mobile services switching centre.
The mobile services switching centre MSC is shown in a simplified way in FIG. 2. Control of the base station system BSS is one function of the mobile services switching centre in addition to a call control. The function of the switching matrix is to select, switch and separate speech/data and signalling paths passing through it in a desired way. The switching matrix switches in this way its part of the connection between a mobile subscriber and a subscriber of another network or of the connection between two mobile subscribers. The function of the Network Interworking Functions IWF 1 is to adapt the GSM network into other networks. The PCM trunk line is connected to a PBX system by a terminal circuit trunk interface 3 so that the physical interface of layer 1 between the exchange and the base station controller BSC is a line of 2 Mbit/s, that is, 32 time slots of 64 kbit/s (=2048 kbit/s). The signalling terminal 4 carries out signalling according recommendation CCITT No:7.
The functions of the base station controller BSC indicated with reference 14 in FIG. 1 include selection of a channel between it and the mobile station, link control and channel release. It carries out mapping from the radio channel to the channel of the PCM time slot of the interconnecting line between the base station and the base station controller. The base station controller shown in a simplified way in FIG. 3 comprises terminal circuits, trunk interfaces 31 and 32 by means of which the base station controller is connected on the one hand to the mobile services switching centre over the A interface and on the other hand to the base stations over the Abis interface. Transcoder and Rate Adaptation Unit TRAU is an element of the base station system BSS and it may be situated in association with the base station controller BSC as shown in FIG. 1, or also in association with the mobile services switching centre, for example. The transcoders convert speech from one digital format to another, for example, they convert the 64 kbit/s A-law PCM from the exchange over the A interface into encoded speech of 13 kbits to be sent to the base station line and vice versa. Rate adaptation for data is carried out between the rate 64 kbits and the rates 3.6, 6 or 12 kbit/s.
The base station controller BSC configures, allocates and supervises the circuits of 64 kbit/s in the direction of the base station. It also controls the switching circuits of the base station by means of the PCM signalling link and allows the circuits of 64 kbit/s to be used efficiently, that is, a switch at the base station, which the base station controller controls, switches transmitter/receivers to PCM links. This switch hence operates as a drop/insert multiplexer, i.e. as an add/drop multiplexer which drops a PCM time slot for the transmitter of the data or inserts a reception time slot to a PCM time slot of the data or links the PCM time slots forwards to other base stations. The base station controller thus sets up and releases connections to the mobile station. The connections from the base stations to the PCM line or lines over the A interface and the procedure in the opposite way are multiplexed in a switching matrix 33.
The physical interface of layer 1 between the base station BTS and the base station controller BSC is a line of 2 Mbit/s, that is, 32 time slots of 64 kbit/s (=2048 kbit/s). The base station is totally controlled by the base station controller BSC and it mainly contains transmitter/receivers TRX which implement the radio interface towards the mobile station. Four full rate traffic channels via the radio interface can be multiplexed into one PCM channel of 64 kbit/s between the base station controller and the base station, in which case the rate of the speech/data channel is in this interval 16 kbit/s. In that case, one PCM link of 64 kbit/s can transmit four speech/data connections.
FIG. 1 illustrates the transmission rates per channel used in the GSM. The mobile station sends speech or data information over the radio interface on the radio channel as traffic frames. A base station 13 receives the information and transmits it to the time slot of 64 kbits of the PCM line. The other three traffic channels of the same carrier wave are also inserted in the same time slot, that is, the channel, so that the transmission rate for a connection is 16 kbit/s. In a base station controller 14 the transcoder/rate adaptation unit TRAU converts the rate 16 kbit/s of the encoded digital information into the rate 64 kbit/s and at this rate the data is transmitted to the mobile services switching centre after which, subsequent to possibly necessary modulation and rate modification, the information is transmitted to some other network.
In accordance with the foregoing explanation, the base station controller selects the circuits with which a connection is set up between it and No the transmitter/receivers of the base station. The radio channel (TDMA time slot) and the PCM time slot of the line between the base station and the base station controller has during the connection a one-to-one correspondence, that is, in the uplink direction the information of a specified time slot of a specified carrier wave is always inserted in the same PCM channel of 16 kbit/s and correspondingly, in the downlink direction the information of this PCM channel is always transmitted to the same TDMA time slot. The base station controller signals to the base station which base station of the TDMA time slot has to be bound to which PCM channel. In that way the base station controller alone allocates the channel through the Abis interface and radio interface as far as the mobile station. When the base station has allocated a channel as far as the mobile station, a mobile services switching centre 15 selects the circuits with which the connection between the mobile services switching centre and the base station controller/TRAU are generated, that is, the circuits towards the A interface of the exchange and the base station controller. At the end the generated links are connected to each other.
Data transmission standard ATM (Asynchronous Transfer Mode) has been introduced for combinations of narrow band and broad band implementations and for transmission of packets and signalling. ATM is a connection-oriented packet switching technique which the international telecommunication standardization organization ITU-T has chosen as an implementation technique of Broadband Integrated Services Digital Network (B-ISDN). In the ATM, data is packed in frames which comprise several packets of a constant length known as cells. The length of a cell is 53 bytes and a cell comprises a header of 5 bytes in length and 48 bytes have been reserved for a payload. When ATM cells are sent, each cell can be directed to different destinations on the basis of its header.
ATM technique is best suited for use in broadband networks, especially in transmission networks using fibre optics. It is therefore probable that in the mobile communication network the present PCM technique using trunk lines of 2 Mbit/s, which the mobile operator has often hired from another F teleoperator, will be replaced with ATM technique. It is necessary to operate in this way especially if the transmission capacity of the radio path is increased so much that the present PCM connection is no longer sufficient. In that case the data transmission capacity and the rate of the mobile communication network would increase considerably. It is also possible that the premises where a new base station is positioned already have an existing ATM connection, in which case it would be tempting to use it.
Speech transmission in ATM cells has become a problem. In present circuit-switched connections, speech transmission is very fast and delays hardly ever cause problems. Instead, it has become a problem how to manage transmission delays when various audio signals to the network from any of the several input points are transmitted by the ATM technique to any of the numerous output points of the network. It is a particular problem how to transmit audio signals converted into PCM encoded signals and multiplexed in PCM devices between the nodes of the network and across the network, which network contains ATM transfer devices and exchanges.
The solutions given to this problem are at least the following a) use of microcells, b) incomplete filling of cells, and c) emulation of circuit switching. When micro cells are used, several speech channels are multiplexed for transporting one ATM cell. It is a problem with the micro cell technique that an ATM cell is no longer the basic unit of switching, in which case ordinary ATM switching devices cannot be used to switch speech channels but special arrangements and devices are needed for releasing speech channels inside the microcells. In incomplete filling of ATM cells, the payload of the cell is left incomplete. In this way the capacity is underused, but it has to be done if delays are to be avoided. In emulation of circuit switching, information moving on the PCM line of 2 Mbit/s is transmitted transparently in one ATM cell flux. A disadvantage of this method is that transmission capacity is always reserved regardless of whether or not there are calls to be transmitted, wherefore the transmission of empty cells cannot be avoided. Another disadvantage is that speech channels of the connection of the point-to-point nature cannot be connected with ATM devices inside the network into different directions.
Patent Application WO 94/11975 discloses a method, a telecommunication network and a switching system for transmitting several PCM encoded speech channels through the ATM network. The method includes features of steps a and c mentioned above. According to the application, several speech channels assigned to the same output node of the ATM network are packed in one ATM cell, whereby sound and narrowband data channels are transmitted in these cells which are transmitted at a reproducing rate which is the same or an integral part of the reproducing rate of a sound-containing PCM signal. Cells are transmitted in the network between the input node and the output node via virtual circuits maintaining a constant rate. When there are no great changes in the traffic so that permanent virtual paths need to be added or deleted between two nodes, the switching system carries out a simple operation: a frame of PCM samples at the input point of 125 microseconds in duration, inserted in one ATM cell is routed through the network to the output node, which means that cells are sent at intervals of 125 milliseconds. One PCM sample comprises one byte, wherefore 48 speech channels at the maximum can be transmitted in one cell. If the capacity of the PCM channel is more than 64 kbit/s, e.g. 384 kbit/s, more bytes are used of the cell for one channel, for example 6 bytes.
None of the above explained methods is as such suitable when the transmission of audio information of the PCM channel between the base station and TRAU is replaced with the ATM connection in order that speech information can be transmitted, when required, directly from one base station to another without the connection passing through the TRAU or the mobile services switching centre as in the prior art GSM system.
A full-rate speech frame in the GSM system is 316 bits. This is about 85% of the length of the payload of an ATM cell (47 to 48 bytes or 376 to 384 bytes). It is conceivable that one speech frame is packed into one ATM cell, in which case about 15% of the maximum bandwidth would be lost. Efficiency is, however, considerably worse when half-rate speech frames, for example, are packed into the ATM cell. The method cannot be used at all if the length of the speech frame exceeds the length of the cell payload in the packet network.
Another possible packet network to which the method of the invention could be applied is Internet. The length of an Internet packet is variable, but from the point of view of bandwidth, it is not efficient to send each traffic frame as an individual packet.
The object of the present invention is thus to develop a method by means of which speech comprising speech frames generated from a PCM encoded speech signal of the speech encoder can be transmitted in a packet network, such as the ATM or Internet network, without a disadvantageous delay and by utilizing bandwidth as well as possible and so that in case of a speech signal, voice quality will remain as good as possible. Another object is that the method can also be employed for transmitting music and video samples. A further object of the invention is develop a method by means of which a speech/audio/video signal of good quality can be transmitted efficiently in packet mode between a base station and a TRAU or two base stations in the mobile communication system.
The object is attained with the method that is characterized by what is stated in claim 1. The dependent claims are directed to the preferred embodiments of the invention.
The invention is based on the idea that the payload of the frames in the packet network is filled as full as possible, in which case some of the speech frames have to be divided into two consecutive frames of the packet network.
A digitalized speech signal is converted frame by frame in a speech encoder into a parameter group which is inserted in a traffic frame. A traffic frame may be a speech frame as such but mostly additional bits are needed for different purposes for the transmission, in which case the length of the frame is greater than the length of a mere speech frame.
The provided traffic frames are inserted immediately in the payload part of the data packet so that the payload parts of the packets are filled completely. A traffic frame, which does not fit into the payload part of the preceding packet, is divided between two distinct packets. The packets are sent via the transmission network to the destination. At the destination the parts of the traffic frame are separated from the payload of the received packet, the parts being assembled into whole traffic frames. The speech frames contained in the traffic frames are passed to a speech decoder for producing the original digitalized speech signal.
The method as such would lead to deterioration of speech quality as some speech frames are sent immediately and some are sent only with a part of the following speech frame. According to the preferred embodiment of the invention, speech quality is improved by buffering speech frames in the memory of the receiver so that the received speech frames are passed to the speech decoder at intervals equal to the intervals in which they were originally formed.
The advantages of the invention are first of all a reduced transmission delay in the network and secondly, the transmission of one call in one packet of the packet network enables packet switching of cells and thus directing the call to the desired destination. This results in a telephone network that utilizes packet network technique efficiently.
Furthermore, the transmission of the call in one packet of the packet network makes it possible that after the call has been terminated, the transmission of the cells also ends, which is contrary to when circuit switching is emulated. The cells need not to be sent during pauses in speech but only when noise parameters are transmitted. Transmission capacity is thus released during pauses for other use, such as for other simultaneous connections, which is contrary to a circuit-switched network where pauses in the connection cannot be utilized with other connections.
As frames associated with one speech signal are inserted in one packet network packet, all the frames in the same packet are transmitted to the same destination, in which case releasing and rerouting of the packets will be avoided at the destination. The use of the method of the invention can be restricted only to audio/video connections, whereby the packets can be sent in a data transmission immediately, without delays.
In place of a speech signal, another audio or video signal may be transmitted, in which case instead of a speech frame, it could be generally called a parameter group. According to the preferred embodiment, the transmission network is an ATM or Internet network, in which case the packet is an ATM cell or an Internet packet.