Field of the Invention
The present invention relates generally to methods, apparatus and systems for communication encoders, decoders, transmitters, receivers and infrastructure and/or user devices. More particularly, aspects of the invention relate to constrained turbo block convolutional codes, constrained interleaving, and related methods, apparatus, and systems for improved constrained interleaving, encoding, decoding, signal mapping, MIMO applications, spatial modulation, and rate matching. The present invention also relates to efficient parallel ASICs and VLSI architectures and optical integrated circuit architectures to implement these methods, apparatus, and systems.
Description of the Related Art
A large body prior art includes of technical publications, patents, and standards that relate to 4G LTE (fourth generation long term evolution) wireless systems. In particular, the relevant prior art relates to encoding and decoding architectures and algorithms for use with the CTC (convolutional turbo code) specified for use with 4G LTE. Specifically, important prior art relates to algorithms and high performance ASIC architectures for CTC encoding/decoding, deterministic contention-free interleavers such as the QPP (quadratic polynomial permutation) based interleavers, and rate matching/puncturing architectures.
A parallel decoding ASIC for the CTCs used in 4G LTE can be found in C. Studer, C. Benkeser, S. Belfanti, and Q. Huang, “Design and Implementation of a Parallel Turbo-Decoder ASIC for 3GPP-LTE,” IEEE J. Solid State Circuits, Vol. 46, No, 1, January 2011 (referred to as the “Studer” reference” herein). A follow on paper explains more improvements and details about efficient parallel decoding of the CTC used in 4G LTE. This second technical publication is: C Roth, S. Belfanti, C. Benkeser, and Q. Huang, “Efficient parallel turbo-decoding for high throughput wireless systems,” IEEE Transactions on Circuits and Systems, 2012 (referred to as the “Roth reference” herein).
One of ordinary skill in the art would be familiar with the Studer reference which explains how to design a highly optimized parallel real time ASIC designed to implement the CTC specified for use in 4G LTE. The Roth reference provides further details and optimizations to the same architecture as described in the Studer reference. One of ordinary skill in the art would also be familiar with the following prior art reference as well: A. Nimbalker, Y. Blankenship, B. Classon, T. K Blankenship, “ARP and QPP Interleavers for LTE Turbo Coding,” WCNC 2008 proceedings, (referred to as the “Nimbalker reference” herein). The architecture in the Studer reference uses the QPP interleaver as described in the Nimbalker reference. The QPP interleaver is important because it is used in the 4G LTE standard and because it can be described as a “contention free” “vectorizable” and “deterministic” interleaver.
As is well known “contention free”/“vectorizable” means that the permutation function has a particular property that aids in parallel processing implementations. Consider a case where there are N=8 parallel processors. Then, as long as N divides the frame size, K, the contention free interleaver places on a given row in memory all of the N elements to be processed by the N processors in a given clock cycle. The QPP only supports up to N=8 level vectorization.
The Studer reference also points out a very efficient way to compute the QPP address sequence. As per the Nimbalker reference, the QPP interleaved address sequence can be written asπQPP(i)=(f1i+f2i2) mod K   (1)where f1 and f2 are suitably chosen interleaver parameters that depend on the code-block size K. Note that in this notation the sequentially incremented symbol i is used to denote a coded bit position in the transmitted frame, and the permuted version of the indexing sequence, πQPP(i), is used to look up a bit position in the non-permuted sequence of input bits. The Studer reference explains a very efficient way to compute equation (1) is to use the following set of recursions which can be easily implemented in hardware. The recursions below only use additions and modulo operations which can be very efficiently implemented in hardware. Hence at runtime, in hardware, equation (1) is computed asπQPP(i+1)=(πQPP(i)+δ(i)) mod K   (2)andδ(i+1)=(δ(i)+b) mod K   (3)where πQPP(0)=0, δ(0)=f1+f2, and b=2f2.
Another prior art reference that is known to those of skill in the art and that goes into further detail about QPP recursions is: Y. Sun and J. Cavallaro, “Efficient hardware implementation of a highly-parallel 3GPP LTE/LTE-advanced turbo decoder,” Integration, the VLSI Journal, No. 44, 2011, pp 305-315, (referred to as the “Sun reference” herein). This reference provides additional recursions that allow QPP addresses to be incremented by an integer, d=Δi, that can be any positive integer. This allows forward and backward sequences of QPP addresses to be generated for forward and backward recursions used in decoding. Also, this allows recursions similar to equations (2)-(3) to increment by more than one element, for example, Δi=K/M, where K is the frame size and M is the number of processors in a system. The Sun reference also explains the prior art knowledge that a set of M different QPP address generators can be run in parallel with relative offsets of one and with Δi=K/M to generate a set of M consecutive QPP addresses in parallel. The Sun reference also provides efficient hardware circuits to implement such an addressing scheme.
Another relevant field of art is called rate matching. Rate matching is also known as “puncturing.” The CTC mother code defined in the LTE standards is a rate ⅓ parallel concatenated turbo code. This CTC leads to very complicated rate matching circuits at both the encoder and the decoder, thus increasing over all hardware complexity of the 4G LTE CTC encoding and decoding. A reference that discusses rate matching for LTE turbo codes is C. Ma and P. Lin, “Efficient implementation of rate matching for LTE codes,” IEEE ICFCC 2010 international conference proceedings, pp. V1-704-708 (referred to as the “Ma reference” herein). FIG. 1 of the Ma reference shows the basic configuration of 4G LTE rate matching at the transmitter side. The data stream plus two streams of parity bits from the rate ⅓ parallel concatenated CTC pass through three parallel blocks labeled “sub-block interleaver.” That is, three interleavers are used, one each to process the total number of bits in a non-punctured frame. Another reference that explains the rate matching used in 4G LTE is L. Yu et al., “An improved rate matching algorithm for 3GPP LTE Turbo code,” Conference on Communications and Mobile Computing (CMC), pp. 345-348, April 2011. FIG. 2 of this article and the discussion thereof is very helpful in understanding the 4G LTE rate matching algorithm.
There also exists a vast body of literature related to OTN (optical transport network) applications. OTN applications are demanding because they require very high data rates and powerful codes and the frame size used in coding/decoding is long, (122,368 message bits plus coding overhead bits). OTN systems are either already available or still being researched and developed to support data rates of 100GBPS (usually referred to as 100G), 400GBPS and even up to 1000GBPS (1 Terabit per second, 1 T). These very high speed systems demand very powerful codes to achieve specified high NCGs (net coding gains) at very low BERs (bit error rates) below 10−15. High speed digital hardware that employs extensive parallel processing is needed to decode these powerful codes in real time.
It can be noted that in OTN applications, the codes being used/considered now correspond to LDPC (low density parity check) codes, concatenations of LDPC codes with one or more long block codes, or TPCs (turbo product codes). OTN applications cannot use CTCs like LTE does because the error floors required by OTN applications are far below those afforded by CTCs. Hence it would be desirable to have a much lower complexity parallel coding/decoding technique and parallel architecture than those that are currently proposed for use in or used in the OTN field. It would be desirable if this low complexity coding/decoding technique could meet the stringent NCGs requirements at BERs of 10−15 and outperform all known coding/decoding techniques that are currently proposed for use in or used in the OTN field.
The prior art also includes U.S. Pat. No. 8,537,919 “Encoding and decoding using constrained interleaving,” and its continuation-in-part, U.S. Pat. No. 8,532,209, “Methods, apparatus and systems for coding with constrained interleaving, and both of these U.S. Patents are incorporated herein by reference in order to provide the reader with written description level details of known constrained interleaver design techniques, and known encoder/decoder structures that use constrained interleaving. These patents are incorporated by references, but it is to be understood that for claim construction purposes, the instant written description should be used, and not any of the written description in the incorporated-by reference patents. In this patent application, some terms are defined differently than the U.S. patents incorporated by reference herein. Therefore, it is to be understood that the interpretation of terms and phrases used in the claims herein should be taken in the context of the present application and not the references incorporated herein. The prior art also includes J. Fonseka, E. Dowling, S. I. Han and Y. Hu, “Constrained interleaving of serially concatenated codes with inner recursive codes,” IEEE Communications Letters, Vol. 17, No. 7, July 2013, referred to herein as “the Fonseka [1] reference.” The prior art also includes J. Fonseka, E. Dowling, T. Brown and S. I. Han, “Constrained interleaving of turbo product codes,” IEEE Communications Letters, vol. 16, 2012, pp. 1365-1368, September 2012, referred to herein as “the Fonseka [2] reference.” The prior art also includes S. I. Han, J. P. Fonseka and E. M. Dowling, “Constrained Turbo Block Convolutional Codes for 100G and Beyond Optical Transmissions,” IEEE Photonics Technology Letters, Vol. 26, No. 10, May 2014, referred to herein as “the Fonseka [3] reference.” The above-listed patents and technical publications also cite to related articles in the technical literature and to other U.S. patent references, which are also part of the prior art. It can be noted that the above referenced patents and technical papers constitute at least a portion of what would be known to one of skill in the art of CTBC (constrained turbo block convolutional) codes.
Consider FIG. 1, which corresponds to FIG. 4 in U.S. Pat. Nos. 8,537,919 and 8,532,209. FIG. 1 shows an encoder structure that can represent a method and/or an apparatus for encoding in accordance with CTBC code. The CTBC encoder embodiment of FIG. 1 makes use of an outer block code (OBC) encoder 405, that encodes in accordance with a selected OBC. For example the OBC can be a (n,k) block code, B, where n>k and n,k are positive integers. The message bit stream at the input can be considered to be a sequence of k-bit blocks consisting of message bits. Each k-bit message block is first processed by the OBC encoder 405 which, in the exemplary embodiment of FIG. 1, encodes according to an (n,k) outer code with minimum Hamming distance (MHD) given by MHD=d0. In some embodiments the outer code 405 can perform outer encoding in accordance other types of fixed-length codes, such as a finite-length convolutional code or an LDPC code, for example. A characterizing feature of the embodiment of FIG. 1 is that it also makes use of an inner recursive convolutional code (IRCC) encoder 415 that encodes its input bit stream in accordance with an inner recursive convolutional code (the selected IRCC). An appropriate IRCC is chosen to have an MHD given by MHD=di. For example, the IRCC, could be selected to be the rate-1 accumulator given by G(D)=1/(1+D). Another specific example of an IRCC is to use the rate-1 accumulator followed by a (λ,λ−1) SPC encoder (or any other block code), a finite-length (finite impulse response) convolutional code, or any other recursive convolutional code (RCC). The value of λ can be chosen to provide design flexibility to chose the IRCC to fine tune the rate and/or the di value to design a CTBC code to meet a particular set of design specifications. In some embodiments, the CTBC code is designed using the rate-1 accumulator as the IRCC, but this CTBC code is then followed by another block code like the (λ,λ−1) SPC encoder mentioned above.
Another characterizing feature of the CTBC encoder 400 is that it makes use of a constrained interleaver 410. Any specific CTBC code is defined in terms of the specifically selected outer block code B used in the OBC encoder 405, the specifically selected recursive convolutional code (RCC) used in the IRCC encoder 415, and a specifically selected constrained interleaver having a specified size and permutation function used in block 410. The constrained interleaver 410, and various forms of its interleaver constraints are described in the above-cited prior art references. The constrained interleaver 410 can be designed to provide an interleaver gain, Gl, similar to uniform interleaving, but also can be designed to ensure that the net MHD of the entire CTBC code satisfies some target MHD, dt≧d0di. It can be noted that if the constrained interleaver used in the CTBC were to be replaced by a uniform interleaver of the same length, a “Uniform-interleaved Turbo Block Convolutional” (UTBC) code would result, and the MHD of this corresponding UTBC code would typically be close to MHD, dt=di.
Various forms of constrained interleavers are defined in the above-referenced U.S. patents and the three above-cited references related to constrained interleaving. A constrained interleaver type 2, i.e., the “CI-2” is introduced and used in the block 401 of FIG. 1. The above-referenced U.S. patents teach how CI-2 interleaver constraints can be defined to design the constrained interleaver 410 to enforce the property MHD, dt≧d0di. In U.S. Pat. No. 8,532,209, the term and notation “Constrained interleaver type 2” and its abbreviation “CI-2” are introduced. In the Fonseka [2] reference, it is shown that CI-2s can be designed to achieve a specified target MHD that satisfies d0di≦MHD≦d02di. CI-2s use inter-row constraints in order to achieve this. Note that the constrained interleaver block 410 in FIG. 1 is labeled “r×ρn constrained interleaver.” This is because, as discussed in the above-referenced U.S. patents, the constrained interleaver's permutation function is designed using a r×ρn row-column matrix structure. That is, the prior art relies upon the CI-2 design matrix, [A]r×ρn and requires certain relations to hold for coded bit positions from different codewords of the OBC that are loaded into [A]r×ρn. In the Fonseka [1] reference, the symbol for the number of rows of the CI-2 design matrix was changed to the symbol, “L,” and the CI-2 design matrix is thus written as of [A]L×ρn. In the rest of this patent application, from here forward, the symbol L will be used to refer to the number of rows in the CI-2 design matrix.
An objective of the CI-2 interleaver is to create CTBC codes that simultaneously provide a specified high MHD while achieving as high of an interleaver gain as possible. The high MHD provides a lower error floor and has other desirable effects in various types of channels, and the high interleaver gain ensures a high coding gain for the CTBC code. However, the interleaver gain attainable by the CI-2 is limited to a large extent by the number of rows, L in the CI-2 design matrix. The lower the number L, for a fixed frame size K, the higher the CI-2 interleaver gain. However, when CI-2 interleavers are used, lowering L will eventually limit the achievable MHD.
It would be desirable to have improved constrained interleavers that do not require a CI-2 design matrix, but instead use L=1, and can thus lead to improved CTBC codes that have higher interleaver gains as compared to a CI-2 interleaver of the same length. It would be desirable to further include improved signal mapping methods, apparatus and systems to map a CTBC code onto a target signal constellation in such a way as to provide a constellation mapping gain, similar to the kinds of gains provided by trellis coded modulation (TCM) and bit interleaved coded modulation (BICM). It would also be desirable to have new rate matching algorithms that could efficiently interoperate with these new and improved CTBC codes and signal mapping subsystems. It would also be desirable to have algorithms developed for applications in multiple input multiple output (MIMO) systems and spatial modulation and subsystem for use communications devices that include in multi-antenna subsystem.
Next consider FIG. 2, which corresponds to FIG. 5 in U.S. Pat. Nos. 8,537,919 and 8,532,209. FIG. 2 shows a prior art receiver method and apparatus for a receiver 500 used to receive and decode a signal r(t) which was generated in accordance with FIG. 1 or a version of a serial concatenated code whose inner coded is a block code that is also discussed in U.S. Pat. Nos. 8,537,919 and 8,532,209. It is important to note that when CTBC codes as generated using an IRCC as shown in FIG. 1 herein, block 510 and the connection between block 510 and 525 will be missing. The block 510 is only used to decode coded signals generated by an alternative embodiment shown in FIG. 2 of the above two referenced patents. So herein, block 510 and should be ignored.
Block 1105 processes or otherwise demodulates a received signal r(t) to generate an initial vector rs, which preferably corresponds to a vector of bit metrics. The bit metrics are preferably used in decoding of the component codes using an α-posteriori probability (APP) decoding technique.
The IRCC soft in soft out (SISO) decoder 515 can implement a well known soft decoding algorithm such as the BCJR algorithm, or a soft output Viterbi algorithm (SOVA), the min sum algorithm. Such algorithms are known to generate extrinsic information indicative of the reliability of the soft decoded results. The BCJR algorithm can be embodied using any of the MAP, Log-MAP, or the Max-Log-Map algorithms. For example, if the IRCC SISO decoder 515 involves the BCJR algorithm, then the IRCC SISO decoder 515 will need to compute a sequence of branch transition probabilities, γ's, that each are a function of a respective element of the received signal metrics, rs, and a corresponding respective element of updated or initial extrinsic information, the Le's. The IRCC SISO decoder 515 will use this sequence of branch transition probabilities, γ's, while making one forward recursion pass to update a set of state metrics, α's, and one backward recursion pass algorithm to update a set of state metrics, β's . Such concepts are well known in the art in the context of decoding convolutional turbo codes (CTCs). Using the calculated α's, β's and γ's values, the BCJR decoding of the IRCC decoder calculates the extrinsic information of all its input bits. For example, see P. Robertson, et al., “A comparison of optimal and sub-optimal MAP decoding algorithms operating in the log domain, ” IEEE ICC 1995, pp. 1009-1013.
The IRCC SISO decoder 515 couples its extrinsic information output to a constrained deinterleaver 520 which deinterleaves the extrinsic information received from the IRCC SISO decoder 515, for example, in accordance with the inverse CI-2 permutation function. The OBC SISO decoder 525 is coupled to receive the deinterleaved extrinsic information from the constrained deinterleaver 520. The OBC SISO decoder 525 also preferably implements a known soft decoding algorithm such as the well known Chase-Pyndiah algorithm (also referred to as the Pyndiah algorithm), low complexity Chase-Pyndiah algorithm, the OSD algorithm and its low complexity variations, or any similar soft decoding algorithm for decoding of block codes, for example. In general, different well known (or proprietary) soft decoding algorithms can be used in the blocks 515 and 525. All such algorithms are well known to those of skill in the art, for example, see J. Cho and W. Sung, “Reduced complexity Chase-Pyndiah decoding for turbo product codes,” pp. 210-215, IEEE workshop on signal processing systems, October, 2011.
It would be desirable to have a decoding architectures that could be used to efficiently decode the new improved CTBC codes. It would be desirable to have additional efficient algorithms and parallel architectures to decode the improved CTBC codes that have undergone additional constrained interleaving based signal mapping and/or rate matching and/or constrained interleaving based spatial modulation.
While the above mentioned prior art relating to constrained interleaving for use with an OBC and an IRCC provide very powerful CTBC codes, the CI-2 is based on the CI-2 design matrix, [A]L×ρn, and the concept of a random interleaver. The construction of the CI-2 requires many randomization operations performed in the CI-2 design matrix and a complicated process of ensuring that randomizations do to not violate any constraints in the CI-2 design matrix. As discussed below, this CI-2 design matrix and design process actually limits BER performance. Also, the CI-2 is not a vectorizable/contention free interleaver. Herein a “random interleaver” is also defined in opposition to a “deterministic interleaver” that uses a mathematical formula to generate the deterministic interleaver permutation. A random interleaver is thus often implemented as a table look up or with a state-machine logic circuit whose sequencing logic does not use a fixed mathematical equation but whose state transition logic needs to be specifically designed for each is frame size.
It would be desirable to have a family of a contention free, vectorizable constrained interleavers, both deterministic and semi-random. It would be desirable to have an SCC that is constructed by coupling the output of the OBC to the IRCC via a contention free, vectorizable and deterministic version of a constrained interleaver. It would further be desirable to be able to design a system that could achieve the memory efficient benefits of the Studer reference, and to also greatly simplify the rate matching requirements of the system. It would be desirable to have a parallel architecture that could meet the encoding and decoding performance requirements of the 4G LTE CTC encoders and decoders, but with simpler computational functional units, less overall computational complexity, and thus lower power consumption. It would be desirable to have a CTBC encoder/decoder architecture that could eliminate the complicated and hardware intensive rate matching and inverse rate matching subsystems required by 4G LTE encoders and decoders. It would also be desirable if the parameters of this same CTBC encoder/decoder architecture could be scaled to higher values of N levels of parallelism and designed to provide the NCGs need at BERs of 10−15 for 400 GHz and beyond OTN applications. It would be desirable to also have a new coded modulation techniques that could be used to map codes onto higher order constellations and to implement advanced functions such as rate matching, spatial modulation, and MIMO systems. It would be desirable if the advanced modulation technique could be used along with optical integrated circuits and similar technology to implement higher capacity optical communication channels, for example 400 GHz and beyond, and 1 Tera Hz and beyond. It would be desirable to have a constrained interleaver design process that did not rely on the CI-2 design matrix and was able to provide higher BER performance for random and deterministic constrained interleavers.