The invention relates to a method and apparatus for maintaining an end-to-end synchronization on a telecommunications connection.
In telecommunications systems, such as an official network, it is very important that electronic interception of the traffic is not possible. The air interface is typically encrypted, so even though the radio traffic is monitored, an outsider cannot decrypt it. In an infrastructure, the traffic is, however, not necessary encrypted, so the traffic, such as speech, can be decrypted using the code of the system in question. Even though an outsider cannot in principle listen to the speech flow inside the infrastructure, this is a possible security risk for the most demanding users. Therefore, a solution has been developed in which speech can be encrypted with end-to-end encryption. An example of a system enabling the end-to-end encryption is the TETRA (Terrestrial Trunked Radio) system.
The basic idea of end-to-end encryption is that a network user, such as an authority, can encrypt and decrypt traffic independently and regardless of the used transmission network for instance in terminal equipment.
In the TETRA system, for instance, when employing end-to-end encryption, the sender first codes a 60-ms voice sample using a TETRA code, thus creating a plaintext sample. The transmitting terminal creates an encrypted sample using a certain key stream segment. The encrypted sample is then transmitted to the network. The recipient decrypts the encrypted sample by using the same key stream segment, thus again obtaining a plaintext sample.
To prevent the encryption from being broken, the key stream segment is changed continuously, which means that each frame comprising a 60-ms voice sample is encrypted with its own key stream segment. Both encryption key stream generators should thus agree on what key stream segment to use for each frame. This task belongs to synchronization control. For the task, synchronization vectors are used that are transmitted between terminals by means of an in-band signal.
The encryption key stream generator generates a key stream segment on the basis of a certain key and an initialization vector. The keys are distributed to each terminal participating in the encrypted call. This is part of the terminal settings. A new key stream segment is thus generated once in every 60 milliseconds. After each frame, the initialization vector is changed. The simplest alternative is to increment it by one, but each encryption algorithm contains its own incrementation method that can be even more complex to prevent the breaking of the encryption.
The task of synchronization control is to make sure that both ends know the initialization vector used to encrypt each frame. For the encrypter and decrypter to agree on the value of the initialization vector, a synchronization vector is transmitted at the beginning of the speech item. In case of a group call, joining the call must be possible even during a speech item. Therefore, the synchronization vector is transmitted continuously for instance 1 to 4 times a second. In addition to the initialization vector, the synchronization vector contains for instance a key identifier and CRC error check so that the terminal can verify the integrity of the synchronization vector. The recipient thus counts the number of frames transmitted after the synchronization vector and the encryption key stream generator generates a new initialization vector on the basis of the initialization vector received last and the number of frames.
A data transmission network may comprise one or more packet-switched connections, for instance IP (Internet Protocol) connections, in which data is transmitted using the voice over IP technology, for instance. RTP (Real Time Protocol) is one standard protocol for transmitting real-time data, such as sound and video images in an IP network, for instance. The IP network typically causes a varying delay in packet transmission. For speech intelligibility, for instance, a varying delay is very deleterious. To compensate for this, the receiving end of the RTP transmission buffers incoming packets to a jitter buffer and reproduces them at a given reproduction time. A packet arriving before the reproduction time participates in the reconstruction of the original signal. A packet arriving after the reproduction time remains unused and rejected.
On one hand, a real-time application requires an as short end-to-end delay as possible, and consequently the reproduction delay should be reduced. On the other hand, a long reproduction delay allows a long time for the packets to arrive and thus, more packets can be accepted. The value of the reproduction delay should thus be adjusted continuously according to the network conditions. Most RTP algorithms have a facility that adjusts the reproduction delay automatically according to the network conditions to improve sound quality. The reproduction delay can be shifted 60 ms forward, for instance, by having the IP gateway create a 60-ms replacement packet. In other words, an extra frame is added to the frame flow being transmitted.
A problem with the arrangement described above is that if synchronized end-to-end encryption coding is used and an extra frame is added to the frame flow, the result is that the frame counter at the receiving end is one frame ahead in relation to the incoming frames and the key stream segment of the receiving end no longer matches the key stream segment of the transmitting end.
Increasing the reproduction delay in the middle of a speech item, for instance, thus has the consequence that end-to-end synchronization is lost and the encrypted speech can no longer be decoded. This continues until the transmitting end sends a new synchronization vector to synchronize the receiving end. This phenomenon can be prevented in such a manner that in semi-duplex calls, for instance, the reproduction delay is changed only after speech items. If the speech items are long, the reproduction delay can then be changed disadvantageously infrequently: the quality of speech may be poor until the end of the entire speech item, because the reproduction delay cannot be changed earlier. Further, in duplex calls, for instance, in which there are no speech items and the terminal transmits continuously, the reproduction delay cannot be changed at all during the call, if loss of synchronization is to be avoided.