The current space mission concept comprises a large set of different strategies, instruments and scientific objectives. As the accuracy of the instruments increases, more telemetry resources are required to download the huge amounts of data generated. Considering the very important constraints of telemetry bandwidth in space, a direct consequence is the need of compressing the data before being transmitted. Having in mind the large diversity of data, a universal coder offering the highest possible compression ratios for almost any kind of data benefits the vast majority of space missions.
As an answer to this challenge, the CCSDS (Consultative Committee for Space Data Systems) proposed a universal compression solution based on a two-stage strategy. The pre-processor stage changes the statistics of the data by applying a reversible function, while the coder stage outputs a variable number of bits for each symbol to be coded. This method accomplishes the objectives mentioned above. It is a quick algorithm that yields good compression ratios and, what is most important, it fully adapts to the data being compressed. Additionally, the system works with data blocks of just 8 or 16 samples, thus rapidly adapting to changes in the data.
This two-stage strategy is an otherwise typical approach also used by several data compression systems. One of the reasons is the availability of many coding algorithms, such as Huffman, arithmetic, range encoding or Rice-Golomb. The latter is the algorithm used in the CCSDS recommendation due to its simplicity and low processing requirements. In most cases, the best compression ratios cannot be obtained with only such coding stage due to the large variety of data commented before. On the other hand, the introduction of a relatively simple pre-processing stage can usually boost the final ratios obtained by the system. Additionally, such pre-processing stage can be adapted to the appropriated needs if necessary, while the coding stage can be kept unchanged.
The pre-processor basically reduces the data entropy, easing the task of the coding stage, since it is most frequently designed in the form of an entropy coder as in the CCSDS recommendation. The pre-processing is often based on a data predictor followed by a differentiator, thus outputting prediction errors. In some cases, such as in the CCSDS standard, it is followed by a value mapper which consists on transforming the positive and negative values to just positive ones through a reversible algorithm. Theoretically, this method helps improving the compression ratio, since the sign bit is not necessary. On the other hand, if signed values are directly coded with a separate sign bit, there will be two codes representing the zero (“+0” and “−0”).
If the prediction is correctly done in the first stage, the error will be small and hence fewer bits will be required to code it. Most of the input data sets can lead to prediction errors that roughly follow a Laplacian distribution. The adaptive entropy coder recommended by the CCSDS is optimal for this distribution, but any deviation from this can lead to a significant decrease in the final compression ratio. To mitigate this weakness in the CCSDS recommendation inventors have developed the so-called PEC (Prediction Error Coder; cf. J. Portell et al., “Designing optimum solutions for lossless data compression in space”, Proceedings of the On-Board Payload Data Compression Workshop ESA, June 2008). It is a partially adaptive solution that needs to be tuned according to the expected distribution of the prediction errors to be compressed. The tests performed revealed that, under realistic conditions, PEC outperforms the Rice coder if adequately calibrated. An automatic PEC calibrator has also been developed, making possible in-orbit re-calibrations of the coder if necessary. This is a mandatory feature, despite of the robustness of PEC. Even the most realistic instrument simulator cannot provide the data that will actually be observed by a satellite, and hence an on-ground calibrated PEC will offer non-optimal compression ratios. An adaptive version of the coder is desirable in order to guarantee the best ratios under any situation.
Data pre-processing is mandatory in any data compression system in order to get the best possible results. It makes possible to achieve the highest compression ratios with the lowest processing requirements. There are many strategies depending on the types of data received, and it is not possible to give a universal method. However, it is important to point out that the objective of the pre-processing stage is to eliminate the correlation between samples and, in general, to decrease the entropy of the input data. Thus, the data are rearranged to some optimal statistics, usually following a distribution that resembles a Laplacian one. After this stage it is then possible to apply a universal coder.
The CCSDS recommendation introduces the PEM (Prediction Error Mapper) immediately after the pre-processing stage. It maps the signed prediction errors into the initial (unsigned) data range, thus avoiding the problem of coding signed values, and thus saving the sign bit.
Nevertheless, in software implementations it implies a non-negligible computational cost. Additionally, it avoids the possibility of generating a “−0” (minus zero) code, which is required by this new coding algorithm. For these reasons it has been chosen to get rid of this mapping stage in the coding algorithm used in the present invention.
The CCSDS recommendation for lossless data compression in space missions includes the split sample mode—the Rice coder—as the general-purpose compression technique. It is worth mentioning that the Rice code is a specific Golomb code in which the factors applied are powers of 2. This variation is very interesting because the code can be computed with very few logical operations, thus making it very efficient on combinational logic circuits. Due to its simplicity and potential, it is a very interesting option for space missions. The adaptive algorithm of the CCSDS standard can also select other techniques, although in most cases the Rice coder will be chosen. The rest of the options are only useful in extreme cases, that is, with very good or very bad statistics.
Several studies have shown that the CCSDS standard is highly sensitive to the presence of noise and, especially, outliers, i.e. values completely outside of the expected statistics. It is an expected result considering that the Rice coder was devised for noiseless data. The excellent performance achieved with pure Laplacian data distributions rapidly degrades with noise, and also when receiving non-Laplacian statistics. This is due to the operation of the Rice-Golomb algorithm, which completely depends on the correct choice of its calibration parameter k. It must be chosen very carefully, because values of k which are too large will lead to low compression ratios, since a larger number of bits of the original value will be output. But the important risk appears when choosing values of k which are too small, which implies a huge number of bits in the fundamental sequence codeword. Hence, even low noise levels lead to a significant decrease in the compression ratio achievable with the CCSDS method.
On the other hand, the PEC system is specifically designed for the coding of prediction errors. That is, it implements the second stage (coding) of a data processing system, receiving the results of a first stage (pre-processing). The simplicity of PEC relies on the coding of value ranges as its basic strategy, the application of which requires only very few and simple calculations. It is actually similar to a coding tree, but with very short branches because these only represent the prefixes, not the values themselves. It is worth emphasizing that PEC is a signed coding algorithm, that is, it assumes the inclusion of a sign bit in the values to be coded. It can be applied to unsigned values as well, but its compression performance will not be optimal. The PEC coder includes two coding strategies. The first one is the Double-Smoothed (DS) algorithm, which is a ranged variable-length code (VLC). The second, or Large Coding (LC) algorithm, is based on prefix codes. In fact, the prefixes are unary codes with the addition of one sign bit at the end. Finally, the third one is the Low Entropy (LE) algorithm, a modification of DS that optimizes the results for highly compressible data.
PEC relies on the adequate definition of a coding table and a coding option. The coding table defines the number of coding ranges or segments. The size of each range (the bits used to code each range) will affect the size of the variable-length code generated. The coding option defines how such ranges are used, that is, how the variable-length code is formed from each value and the table definition.
1) Double-Smoothed option: The double-smoothed variant has been defined using four coding ranges, that is, with a coding table of four components. A five-range scheme was also tested but it was finally discarded because the compression gain was small. This algorithm is called double-smoothed because it has two shortcuts to reach the two last coding ranges. They are indicated with an escape value based on the “−0” code. The usage of coding segments combined with this escape value significantly smoothes the code lengths generated by large values, while keeping short codes for low values.
2) Large coding option: The operation of the large coding algorithm is similar. A coding table of four segments is still used, but now it is introduced a small header in the output code that directly indicates the code segment being used. This header, consisting of the unary code sequence, avoids the generation of “leap segments” and, thus, large values lead to shorter codes. The counterpart is a very small increase in the code size for the smallest values. Also, the “−0” code is not required in this variant, and hence the sign bit for zero is not included. This code is usually more efficient than the double-smoothed in the case of large values of the entropy. For this reason it will be mainly used for data fields in which the pre-processing results are significantly spread.
3) Low Entropy option: In cases where the expected entropy of the data to be compressed is very low, a modification of the double-smoothed option is applied. More specifically, the escape sequences for the second and third segments are exchanged. That is, the second segment is indicated by a “−0” code in the first segment, while the third segment is indicated by a leap value in the second segment. In this way, the full range of the first segment can be used (without reserving any leap value), which has a beneficial effect on the codification of very small values, at the expense of an increase in the code length for intermediate and large values.
It is worth mentioning here that the choice between the two coding variants is also taken automatically by the PEC calibrator described below. With the adequate combination of these two coding algorithms, high compression ratios can be achieved with a simple (thus fast) coding algorithm.
Hence, for the appropriated operation of the PEC coder, the system must be calibrated, i.e. an adequate coding table and coding option must be selected for the operation of PEC (this is one of the main disadvantages of the PEC coder which it is solved by the present invention). In order to easily determine the best configuration for each case an automated PEC calibrator is available. First of all, a representative histogram of the values to be coded shall be prepared. This histogram is equivalent to the probability density function (PDF) of the data field to compress, from which the entropy can be calculated. The histogram is passed through the PEC calibrator software, which runs an exhaustive analysis of the data and determines the best coding table and coding option. The optimal solution is obtained by means of a trial-and-error process, i.e. testing all the possible combinations and calculating the expected compression ratio for each. After this training stage, PEC is ready to operate on the real data. Although this calibration process is much quicker than it may seem, it is too expensive in computational terms for being included in the coder. In the case of space missions, the calibrator must be run on-ground with simulated data before launch. It should be run periodically during the mission as well, re-calibrating PEC with the data actually being compressed.
Owing to its coding strategy, PEC can be considered as a partially adaptive algorithm. That is, the adequate segment (and hence the code size) is selected for each one of the values. This is an advantage with respect to the Rice coder, which uses a fixed parameter for all the values, at least within a coding block, in the case of the CCSDS recommendation. Another advantage is that PEC limits the maximum code length to twice the symbol size in the worst of the cases (depending to the coding table). Nevertheless, despite of these features, should be noted that PEC must be trained for each case in order to get the best compression ratios. Therefore, if the statistics of the real data significantly differ from those of the training data, the compression ratio will decrease.
Summarizing, on one side the CCSDS system is too sensitive to any deviation of the input data and the software versions of said systems are too slow in most of the cases. On the other side, the PEC system is quite robust to deviations of the input data and the software versions are faster and more efficient than the CCSDS ones. However, the PEC system needs a previous calibration phase quite slow in computational terms.