1. Field of the Invention
The invention relates generally to telecommunication traffic measurement. In particular, the invention relates to methods, computer programs and apparatuses for providing anonymous user identifications for use in processing telecommunication traffic measurement data.
2. Description of the Related Art
Today, various kinds of traffic measurements—e.g. traffic traces—are routinely performed on both packet switched and circuit switched telecommunication networks. For example in the case of packet switched networks, these traffic measurements may contain e.g. packet headers, signaling messages, and/or authorization log-files. Such traffic measurements are utilized e.g. in examining the status and performance of a network, and to ensure the correct operation of the network. Furthermore, traffic analysis based on these measurements provides valuable data about user behavior and trends in application and network usage.
Typically, the traffic measurements contain identification information that can be used to identify subscribers and the kind of services the subscribers are using. Obviously, such identification information is highly confidential and usually only the network operator is legally allowed to handle it and even then only for certain reasons, such as troubleshooting and accounting.
Traditionally, this confidentiality has not caused problems since such measurements have been conducted by the network operator using e.g. specialized Synchronous Digital Hierarchy (SDH) or Signaling System #7 (SS7) signaling analyzers.
However, there is an increasing trend of outsourcing network management tasks. As a result, traffic measurement data including subscriber identification information of a given network may today be handled or processed by staff external to the operator of the given network. Obviously, this contradicts the above confidentiality requirement.
Given that in most traffic measurement and analysis cases it is not necessary to know the actual identities of the subscribers—rather being able to find out which packet or call belongs to which particular anonymous subscriber is sufficient—the above confidentiality requirement may be met by anonymizing the traffic measurement data by replacing each included real user identification with an unique label. Often, the traffic measurement data contains multiple information fields that need to be anonymized, e.g. telephone numbers, subscriber line identifications, IP addresses, and the like. Even anonymized measurement data can be used to track the traffic from and to a given subscriber: the network operator can provide an anonymized user identification of the given subscriber to outsourced network management staff and ask them to find out, for example, whether something in the network is degrading the performance for the given subscriber.
While there are prior art concepts for anonymizing traffic traces they all have significant drawbacks: usually they are either not secure enough, not fast enough, or not suitable for distributed on-line measurements.
For example, it is known to encrypt the user identifications included in the traffic measurement data using straightforward symmetric encryption. However, given that a user identification (e.g. a telephone number or an IP address) to be encrypted, i.e. the plaintext, is relatively short (typically 32-128 bits), and given that typically there is only a limited set of possible user identifications, symmetric encryption based anonymization schemes are insecure. If an attacker knows or has enough hints to guess from which network a traffic trace originates, the attacker can use known addresses to find out ciphertext—plaintext pairs. For example, in the case of TCP/IP traces, port numbers can easily reveal well known servers in the target network, such as Domain Name System (DNS), mail, and Post Office Protocol (POP) servers. Furthermore, the attacker can launch an active attack if the attacker knows that traffic trace collection is presently ongoing. In the active attack, the attacker starts e.g. a TCP/IP session at a certain time and records that session. Later, the attacker can use a fingerprint of that TCP/IP session to find the same fingerprint among the traffic trace being thus able to gain many plaintext—ciphertext pairs.
Furthermore, it is known to use cryptographic hash functions to encrypt the user identifications included in the traffic measurement data. However, cryptographic hash functions, such as those based on public key encryption, are computationally expensive and thus too slow for on-line anonymizations at line-speed. For example, tests performed by the applicant with a 1.89 GHz Fujitsu SparcV show that, while normal encryption speed of 64-bit blocks with Data Encryption Standard (DES) is 2.5×106 1/s, the speed of hashing with DES is only 47×103 1/s.
In addition, it is known to replace a user identification included in the traffic measurement data with a unique label or the like. Such unique labels or the like may be stored e.g. in a replacement table. However, such replacement schemes are not suitable for distributed on-line measurements, particularly given that such a replacement table is usually generated on-the-fly. While a pre-made replacement table could theoretically be distributed to measurement locations, such replacement tables would be extremely large—e.g. approximately 32 GB for 32-bit IPv4 addresses—impeding distribution of such replacement tables significantly. Thus, this replacement scheme is typically used with post-processing measurement data in a centralized location where it is easy to share the replacement table.
Anonymization of user identifications included in or otherwise associated with telecommunication traffic measurement data needs to be secure, and fast enough to allow the anonymization to be performed on-line, and easy to use with distributed traffic measurements. The anonymization speed is important because, if anonymizations can be done at the rate of line-speed, there is no need to store user identifications temporarily to hard-disk or memory. Distributed traffic measurements are needed e.g. when it is necessary to inspect the performance of various parts of a network. Such measurements are becoming more and more important, particularly as traditional TDM (Time-Division Multiplexing) transport networks are being replaced with heterogeneous packet networks. Locating faults (such as degraded performance) and ensuring Quality of Service are much harder tasks in packet based networks than they used be in legacy telecommunication networks. Furthermore, few common monitoring functions are shared by various vendors. Thus, distributed traffic measurements are usually required to pin point hard-to-catch errors in heterogeneous networks.
Therefore, an object of the present invention is to alleviate the problems described above and to introduce anonymization of user identifications included in or otherwise associated with telecommunication traffic measurement data that is fast, secure, and easy to use with distributed traffic measurements.