This invention relates to coding of data.
A binary constant weight code is a code where each member of the code (i.e., each codeword) has the same number of 1's. Constant weight codes have numerous applications.
A conventional general purpose technique for encoding data into constant weight codes is based on a recursive expression for determining the lexicographic index of an element of a codebook. The operation of encoding is equivalent to determining the codeword, given its index, and the operation of decoding is equivalent to determining the index, given the codeword. If b=(b1, b2, K, bn) is used to denote the codeword, biε{0, 1}, the lexicographic index v(b) is
                              v          ⁡                      (            b            )                          =                              ∑                          m              =              1                        n                    ⁢                                          ⁢                                    b              m                        ⁡                          (                                                                                          n                      -                      m                                                                                                                                  w                      m                                                                                  )                                                          (        1        )            where wm is the number of ones in the m-bit prefix of b. See T. M. Cover, “Enumerative source encoding,” IEEE Trans. Information Theory, vol. 19, no. 1, pp. 73-77, January 1973; and J. P. M. Schalkwijk, “An algorithm for source coding,” IEEE Trans. Information Theory, vol. IT-18, pp. 395-399, May 1972. The resulting code is fully efficient, but the complexity of the technique limits its direct application to small block lengths. This is mainly due to the fact that the binomial coefficients in (1) become extremely large, requiring extended precision arithmetic to prevent overflow errors.
Arithmetic coding is an efficient variable length coding technique for finite alphabet sources. Given a source alphabet and a simple probability model for sequences, with p(x) and F(x) denoting the probability distribution and cumulative distribution function of sequence x, respectively, an arithmetic encoder represents x by a number in the interval [F(x)−p(x),F(x)]. The implementation of such an arithmetic coder can also run into problems with very long registers, but elegant finite-length implementations are known and are widely used. See I. H. Witten et al., “Arithmetic coding for data compression,” Communications of the ACM, vol. 30, pp. 520-540, June 1987. For constant weight codes, the idea is to reverse the roles of encoder and decoder, i.e., to use an arithmetic decoder as an encoder and an arithmetic encoder as a constant weight decoder. An efficient algorithm for implementing such codes using the arithmetic coding approach is given in T. V. Ramabadran, “A coding scheme for m-out-of-n codes,” IEEE Trans. Communications, vol. 38, no. 8, pp. 1156-113, August 1990. The probability model used by the coder is adaptive, in the sense that the probability that the incoming bit is a 1 depends on the number of 1's that have already occurred. This approach successfully overcomes the finite-register-length constraints associated with computing the binomial coefficients and the resulting efficiency is often very high, the loss of information bits being one bit or less, in most cases. The encoding complexity of the method is O(n).
A different method for encoding and decoding balanced constant weight codes was developed by Knuth, as described in D. E. Knuth, “Efficient balanced codes,” IEEE Trans. Information Theory, vol. 32, no. 1, pp. 51-53, January 1986, and is referred to as the complementation method. The method relies on the key observation that if the bits of a length-k binary sequence are complemented sequentially, starting from the left, there must be a point at which the weight is equal to └k/2┘. Given the transformed sequence, it is possible to recover the original sequence by specifying how many bits were complemented (or the weight of the original sequence). This information is provided using check bits of constant weight, and the resulting code consists of the transformed original sequence followed by the constant weight check bits.
In a series of papers, Bose and colleagues extended Knuth's method in various ways, and determined the limits of this approach. See, for example, J.-H. Youn and B. Bose, “Efficient encoding and decoding schemes for balanced codes,” IEEE Trans. Computers, vol. 52, no. 9, pp. 1229-1232, September 2003, and the references therein. Knuth's method is simple and efficient, and even though the overall complexity is O(n), for n=100 it can be eight times as fast as the method based on arithmetic codes. However, this method only works for balanced codes, which restricts its applicability.
In light of the available prior art, what is still needed is an effective and fast method for encoding and decoding constant weight codes that is not restricted in its applicability.