Data compression is a process of reducing the size of a data file, and is mainly used in static files. For instance, the lossless method Lempel-Ziv (LZ), as well as its variants ‘pkzip’ and ‘gzip’, are widely spread and known compression methods for compression of static data files.
In addition, there are standards for data compression, such as RFC 3173 and RFC 1444, and which are applicable for specific protocols, such as for TCP/IP headers. Specifically in the radio networks, WCDMA and LTE make it use of a Packet Data Convergence Protocol (PDCP) based on a Robust Header Compression (RoHC) to compress IP/TCP headers.
Generally speaking, the above techniques are not usually followed to compress data packets from higher application layers but only for TCP/IP headers; and, where an attempt is made to apply the above techniques for compression of data packets from higher application layers, i.e. OSI layer 7, the compression rate achieved is poor.
Currently, there are approaches to develop methods for compression and decompression and applicable for specific protocols. For instance, Anat Bremler-Barr (Interdisciplinary Center Herzliya), Shimrit Tzur David (Interdisciplinary Center Herzliya & The Hebrew University of Jerusalem), David Hay (The Hebrew University of Jerusalem) and Yaron Koral (Tel Aviv University) have developed a method disclosed in http://www.eng.tau.ac.il/˜boaz/inetsem/tzur.pptx, hereinafter Tzur method, for compression and decompression over HTTP protocol by using Deep Packet Inspection (DPI) for shared dictionary. The Tzur method applies an Aho-Corasick (AC) algorithm on a dictionary represented by a finite state machine with a number of states (S0, . . . S14), wherein the occurrence of an individual character (B) represents a transition from one state (S0) to another (S2) and it is represented as a function [g(S0, B)→S2], wherein a sequence of characters (C, D, B, C) to be replaced in the text, namely a pattern, is associated with a corresponding sequence of states (S7, S8, S9, S10), wherein the states are sequentially ordered (5, 6, 7, 8), and wherein the pattern (C, D, B, C) can be replaced by a pair comprising a pointer (5), namely the ordinal number of the first state in pattern, and a distance (4) to jump in order to reach a first next character not included in the pattern to replace. For example, if the uncompressed data were ABDBCDBCAAB, with the assumptions above, the compressed data would be ABDB(5, 4)AAB.
Whilst it is recognized that the Tzur method can operate with patterns of different lengths and can benefit the replacement of larger patterns by the pair pointer and distance, the required association of individual characters with individual states, and the association of a particular character with as many states as appearances of the particular character in different patterns, may require a very large and complex state machine as well as complex rules or restrictions to handle shared prefixes between different patterns.
Apart from that, this sort of mechanisms, where individual characters or even group of characters are directly associated with one or more states and represent transitions between states, become quite complex when used for compression of binary files, especially, on the handling of prefixes between different patterns.
Regarding compression and decompression of data packets, wherein a sequence of data packets might involve data packets based on different protocols, the Tzur method might be quite unfeasible in terms of difficulties to configure a dictionary with patterns for different protocols, huge amount of states in the state machine especially due to the individual association of one character with one transition between two states, and the handling of prefixes between different patterns.
Regarding DPI, current implementations of entities of a Policy and Charging Control (PCC) architecture, as specified in 3GPP TS 23.203 v12.2.0 (2013 Sep. 12), already incorporate DPI techniques. For instance, a Policy Control Enforcement Function (PCEF) of the PCC architecture encompasses service data flow detection, policy enforcement and flow based charging functionalities. A DPI technology, when embedded in the PCEF, supports packet inspection and service classification, which consists on IP packets classified according to a configured tree of rules so that they are assigned to a particular service session. In addition, some current embodiments of a Traffic Detection Function (TDF), which is an entity of the PCC architecture in charge of performing application's traffic detection and reporting of the detected application, and even an Application Function (AF) of the PCC architecture, may also incorporate DPI techniques for traffic inspection.
A central point of the PCC architecture, interfacing with the PCEF, the TDF and the AF, is the Policy and Charging Rules Function (PCRF). The PCRF is a functional element that performs policy control decisions and flow based charging control. The PCRF provides network control regarding the service data flow detection, gating, quality of service (QoS) and flow based charging—except credit management—towards the PCEF.
At present, the PCC architecture as specified in 3GPP TS 23.203 v12.2.0 (2013 Sep. 12) does not provide for compression and decompression of data packets.
There is thus a need to develop an alternative mechanism for compression and decompression of data packets which could offer a harmonized structure and operation irrespective of the protocol involved in the data packets and which may optimize the compression for those patterns more frequently appearing.