In overview, contemporary data communication networks and data processing apparatus are required to handle increasingly larger volumes of data. Such handling of data correspondingly requires more data communication bandwidth and/or more data storage capacity. Such bandwidth and/or storage capacity is costly to provide. Thus, there is a considerable benefit that is derivable from compressing data when communicating and/or storing the data.
Contemporary information is often represented in a form of data, for example audio, image, video, graphics, ECG, seismic, measurement data, numbers, excel charts, characters, text, news, ASCII characters, Unicode characters, binary data, commercials, multidimensional data. Moreover, such data is expressible in different formats such as bits, bytes, words, characters, numbers, figures and so forth. Moreover, contemporary information is encodable by employing potentially a multitude of different encoding methods which have been developed in recent decades. As aforementioned, it is most often necessary to store and/or transmit information, and thus it is beneficial that the information can be expressed with as small amount of encoded data, for example entropy coded data and additional information, as possible, for example as regards the data size in bits.
When considering methods of encoding data, it is convenient to consider each individual piece of information as an element or a symbol. Such representation of the pieces of information as elements or symbols allows an entropy for the information to be calculated, for example using Shannon entropy computation, see references [2], [3] and [4]; such computation can be executed for various different kinds of symbol representations before and after a multitude of different algorithms, for example entropy coding algorithms and/or entropy modifying algorithms. For example, the individual symbols can be entropy-encoded using multiple different entropy encoding methods. Moreover, symbols can also be converted from one form to another form, for example numbers can be converted to text, text can be converted to words, and bits can be converted to bytes.
Examples of an individual symbol include, for example, a bit value (1, 6, 8, 10, . . . bit), a byte value (8-bit), a word value (16, 32, 64, 128, . . . -bit), a (ASCII. Unicode, Chinese, Arabian, . . . ) character, a positional notation as binary (base=2), an octal (base=8) notation, a decimal (base=10) notation, a hexadecimal (base=16) notation, or a Roman numeral notation. Optionally, numeric symbols can have a radix point, namely a fractional or real value, or a non-radix form (natural or integer value). Moreover, optionally, symbols can also include pictures, data or database elements, and so forth. Furthermore, numbers and characters can also be represented as symbols that are based on individual numbers or characters, for example ASCII, or as a combination of multiple ASCII symbols that represents, for example, numerical values, words, or sentences.
As aforementioned, a symbolic representation enables a computation for entropy of information to be executed. Moreover, entropy for the same given piece of information can be calculated by using different symbol representations, whereby distinctly different entropy results are produced; for example, different sets of symbols employed to represent the information can result in different entropies for the information as represented by one or more symbols of the sets. Additionally, different symbols can also be entropy-encoded very differently, as required. It is feasible for some representations of the aforesaid information to be entropy-encoded very closely to their ideal entropy, for example as derived via bits with an Arithmetic Coder or Range Coder, whereas some representations require more additional information for entropy encoding to succeed, for example as encountered for words or database elements.
The aforesaid additional information mentioned needs to be delivered in one way or another from a given encoder to a corresponding decoder, so as to enable unique decoding of encoded data to be achieved. Furthermore, it is beneficial when some additional information is already available both at the given encoder and at the given decoder, so that this information does not need to be delivered at all, or it can be delivered in a very small format, for example by using one or more indexes identifying one or more tables.
In other words, a manner in which the aforesaid information is delivered by way of corresponding encoded data makes a big difference to a degree of data compressing that is achievable in the encoded data; possible alternatives are, for example, sending the entire original information itself as original symbols, as symbols in compressed form, or as a selection index to available information alternatives. Moreover, the entire delivered information, or portions thereof, can be reused, which also creates multiple options for compressing the information, corresponding data or corresponding encoded data to an even greater degree.
Especially, when the amount of original data representing the information increases, there often are no suitable static tables or databases available for selection when communicating the information. However, after the delivery of one or more tables in association with communication of the information in encoded form, there is potentially some table that can be reused, for example for other information communicated at a later time whose encoding method refers to that table. It will also be appreciated that the piece of information that is to be compressed is potentially a part of a larger information entity; for example, the piece of information can be analysis results of full or partial data, method parameters for one or more data blocks and so forth, for example multilevel method levels, database references, a part of the original data (for example, ROI, slice of frame, image from video).
When there is a lot of data to be communicated, the entropy of the data dominates the amount of data to be delivered. Similarly, when there is only relatively little data to be delivered, the additional information is often, to a great extent, a major part of the delivered data; in other words, the additional information can potentially represent a considerable data overhead. Thus, there is required a need for an optimization, so that a sum of entropy encoded data and the additional data is minimized; as will be elucidated later, an invented Continuum Operator pursuant to the present disclosure is a very good tool for such purpose of optimization.
There is a large variety of different data compression methods which are contemporarily available for compressing data. Some of the compression methods are specialized for some particular kind of data, for example JPEG/PNG for compressing images, AAC/MP3 for compressing audio, PNG/GIF for compressing graphics, HEVC/VP9 for compressing video, and so forth. Some of the methods are more eclectic, for example BZip, 7Zip, RLE, SRLE, VLC, Range Coding, Arithmetic Coding. Moreover, there are also methods available for modifying an entropy of bit data, for example as employed in an entropy Modifier (EM), as described in a United Kingdom patent application GB1303658.7 which corresponds to allowed U.S. patent application Ser. No. 13/782,757 and ODelta Coding, as described in United Kingdom patent application GB1303661.1 which corresponds to allowed U.S. patent application Ser. No. 13/782,819, and methods of modifying entropy of symbol data that is not represented as individual bits, for example DPCM, Delta Coding, ODelta Coding, RLE, SRLE, as described in a United Kingdom patent application GB1303660.3, corresponding to allowed U.S. patent application Ser. No. 13/782,872. Although Shannon entropy, as described in references [2], [3] and [4], is well known, it is not a generally utilized properly in current compression methods. Shannon entropy can be computed using Equation 1 (Eq. 1) as follows:
                    Entropy        =                  -                                    ∑                              i                =                1                            n                        ⁢                                          p                ⁡                                  (                                      x                    i                                    )                                            *              log              ⁢                                                          ⁢                              p                ⁡                                  (                                      x                    i                                    )                                                                                        Eq        .                                  ⁢        1            wherein:n is number of different symbols; andp(xi) is probability of the symbol indexed by i,
Entropy is often multiplied by the number of all symbols so as to make the value more comparable to other calculated entropy values. This comparable entropy value can also be changed to estimate the used bits by dividing the value of comparable entropy by a log(2) value.
Instead of entropy, rate-distortion (RD) optimization is often used in lossy coding for selecting a best compression method or method combination. In lossless coding, entropy per se can be used to select methods or algorithms, because in lossless coding, there is no distortion on which RD-optimization is based, and so the rate alone is conveniently estimated by entropy only, together with additional information.
Interleaving the data corresponding to the aforesaid information is also a known prior art method. For example, pixel color values, for example expressed as RGB as described in reference [11], in a given image can be expressed in a planar form as (RRRR . . . , GGGG . . . , BBBB . . . ) or in an interleaved form as (RGB, RGB, RGB, RGB, . . . ).
In a United Kingdom patent application GB2301252, see reference [10], there is described a known method for encoding bits present in data. The known method employs multiple different length remainder symbols, but the multiple different remainder symbols are utilized in a strictly defined way one after another, they represent different bit dynamics, and the known method is only suitable for bit symbols. However, using any known methods separately or in known combinations is not able to address, to a sufficient degree, any of three major problems related to data compression. All of the aforesaid methods, and combinations thereof, have a multitude of disadvantages.
When information is encoded, for example compressed, three major problems arise:                1) A first problem relating to a manner of selecting a most appropriate form of symbols to be used when information is to be compressed;        2) A second problem relating to a manner of reducing similar consequent symbols most efficiently; and        3) A third problem relating to a manner in which to reduce, for example to a minimum, a data size of encoded data and additional information most efficiently, while still enabling unique data to be decoded, for example decompressed in a decoder.        