The invention relates to a method and equipment for maintaining end-to-end synchronization on a telecommunications connection.
A vitally important feature of telecommunications systems, such as public authority networks, is that the traffic is secure from eavesdropping. The air interface is typically encrypted, and although radio traffic is being monitored, an outsider is not capable of removing the encryption. Infrastructure traffic, however, is not necessarily encrypted, which means that traffic, such as speech, can be decrypted using the codec of the system in question. Although in principle it is not possible for an outsider to listen to the speech flow from the infrastructure, most demanding users consider this a security risk. For this reason, a solution has been developed to allow speech to be encrypted with end-to-end encryption. One example of a system enabling end-to-end encryption is the TETRA system (TErrestrial Trunked RAdio).
The underlying idea of end-to-end encryption is that the network user, such as a public authority, carries out the encryption and decryption independently, regardless of the transfer network employed, in connection with terminal equipment, for example.
When end-to-end encryption is used in the TETRA system, for example, the sender first encodes a voice sample of 60 ms using a TETRA codec to produce a plaintext sample. Using a specific key stream segment, the transmitting terminal equipment creates an encrypted sample, which is transmitted to the network. With the same key stream segment the recipient then decrypts the encrypted sample to reproduce the plaintext sample.
To prevent the breaking of the encryption, the key stream segment is constantly changed, each 60-ms voice sample being encrypted using a separate key stream segment. Both key stream generators must therefore agree on the key stream segment to be used for each frame. This task belongs to synchronization control and it is carried out using synchronization vectors transmitted between terminal equipment by means of an in-band signal.
The key stream generator generates a key stream segment on the basis of a specific key and an initialisation vector. The keys are distributed to each terminal participating in the encrypted call. This forms part of the terminal equipment settings. A new key stream segment is thus created once in every 60 milliseconds. After each frame, the initialisation vector is changed. The simplest alternative is to increment it by one, but every encryption algorithm comprises its own incrementation method, which may be even more complex, to prevent the breaking of encryption.
The synchronization control is responsible for ensuring that both ends know the initialisation vector used with which each frame is encrypted. To allow the encrypter and the decrypter to agree on the value of the initialisation vector, a synchronisation vector is sent at the beginning of a speech item. When a group call is concerned, joining the call must be possible also during a speech item. For this reason, the synchronisation vector is sent continuously, for example 1–4 times a second. In addition to the initialisation vector, the synchronization vector comprises for example a key identifier and CRC error check to enable the terminal equipment to verify the integrity of the synchronization vector. The recipient thus counts the number of frames transmitted after the synchronization vector and on the basis of the last received initialisation vector and the number of the frames, the key stream generator generates a new initialisation vector.
A data transmission network may comprise one or more packet-switched connections, such as IP (Internet Protocol) connections, in which data are transmitted using voice over IP (VoIP), for example. A standard protocol for transferring real-time data, such as voice and video image, in an IP network, for example, is RTP (Real Time Protocol). The IP network typically causes a varying delay in the transfer of packets. For speech intelligibility, for example, variation in the delay is most harmful. To compensate for this, the receiving end of the RTP transmission buffers incoming packets to a jitter buffer and reproduces them at a specific reproduction time. A packet that has arrived before the reproduction time participates in the reconstruction of the original signal, whereas a packet arriving after the reproduction time remains unused and is discarded.
On one hand, a real-time application requires an as short end-to-end delay as possible and therefore the reproduction delay should be reduced. On the other hand, a long reproduction delay allows a long time for the arrival of the packets and thus more packets can be accepted. Consequently, the reproduction delay value should be continuously adjusted according to the network conditions. Most RTP algorithms comprise a facility that adjusts the reproduction delay automatically according to the network conditions to improve voice quality. To shift the reproduction delay onward by 60 ms, for example, the IP gateway creates a replacement packet of 60 ms. In other words, an extra frame is added to the frame flow being transmitted. To shift the reproduction delay backward, at least one frame is removed.
A problem with the arrangement described above is that when synchronized end-to-end encryption coding is used and an extra frame is added to the frame flow, this causes the frame counter at the receiving end to be one frame ahead with regard to incoming frames and, therefore, the key stream segment of the receiving end no longer corresponds to that of the transmitting end. Correspondingly, if a frame is removed from the frame flow, the frame counter at the receiving end is delayed by one frame in relation to incoming frames, and the key stream segment no longer corresponds to that of the transmitting end.
The shifting of the reproduction delay in the middle of a speech item, for example, therefore causes end-to-end synchronization to be lost, and the encrypted speech can no longer be decoded. This continues until the transmitting end sends a new synchronization vector to synchronize the receiving end. This can be avoided in semiduplex calls, for example, by changing the reproduction delay after speech items only. If the speech items are long, the possibility to change the reproduction delay may appear disadvantageously seldom and thus speech quality may be poor until the end of the entire speech item because the reproduction delay cannot be changed earlier. Moreover, in duplex calls, for example, where there are no speech items and the terminal transmits continuously, the reproduction delay cannot be changed at all during the entire call if loss of synchronization is to be avoided.