The invention relates to high-speed and low power channel coding in communication systems and coders and decoders providing channel coding and decoding.
In digital communication systems, reliable transmission is achieved by means of channel coding, a class of Forward Error Correction (FEC) techniques. Coding the information means adding redundancy to the bit stream at the transmitter side, so that it can be properly reproduced at the receiver side.
Ever more (wireless) networks and services are emerging. Therefore, (wireless) communication systems should strive to utilize the spectrum capacity to its maximum. The theoretical limits of the achievable capacity on a communication channel were set by Shannon""s fundamental concepts almost 60 years ago, as described in C. Shannon, xe2x80x9cA mathematical theory of communicationsxe2x80x9d, Bell Sys. Tech. Journal, vol. 27, October 1948. Decades of innovations in digital communication, signal processing and very large scale integration (VLSI) were needed to bring the efficiency of practical systems near the theoretical bounds. Only recently, a new FEC coding scheme, turbo coding, was conceived, which allows to approach Shannon""s limit much closer than any FEC scheme previously known. This coding scheme is described in C. Berrou, A. Glavieux, P. Thitimajshima, xe2x80x9cNear Shannon limit error-correcting coding and decoding: turbo-codesxe2x80x9d, Proc. IEEE ICC, pp.1064-1070, May 1993. In this technique, large coding gains (meaning less transmission power for the same bit error rate (BER) are obtained using two or more constituent codes working on different versions of the information to be transmitted. Decoding is done in an iterative way, using a different decoder for each constituent encoder. The information provided by one decoder is processed iteratively by the other decoder until a certain degree of refinement is achieved. A general turbo coding/decoding scheme for Parallel Concatenated Convolutional Code (PCCC) is depicted in FIG. 1. The information bitstream I to be transmitted is encoded by a first encoder C1 and a second encoder C2, e.g. in a pipeline. The second encoder C2 works on an interleaved version of the information bitstream 1, produced by an interleaver Π. The interleaver Π randomises the information bitstream I to uncorrelate the inputs of the two encoders C1, C2. Three bitstreams are transmitted: the information bitstream itself Xk (called the systematic sequence), the coded sequence Yk1 and the coded sequence Yk2 (both called parity sequences). The decoding process begins by receiving partial information from the channel (Xk and Yk1) and passing it to a first decoder D1. The rest of the information, parity 2 (Yk2), goes to a second decoder D2 and waits for the rest of the information to catch up. Decoding is based preferably e.g. on a Maximum A Posteriori (MAP) decoding algorithm or on a Soft Output Viterbi Algorithm (SOVA). While the second decoder D2 is waiting, the first decoder D1 makes an estimate of the transmitted information, interleaves it in a first interleaver Π1 to match the format of parity 2, and sends it to the second decoder D2. The second decoder D2 takes information from both the first decoder D1 and the channel and re-estimates the information. This second estimation is looped back, over a second interleaver, being deinterleaver Π1xe2x88x921 to the first decoder D1 where the process starts again. The main idea behind iterative decoding is that decoded data is continuously refined. Part of the resulting decoded data (called extrinsic information) produced by each decoder D1 resp. D2 in each iteration is then fed back to the other decoder D2 resp. D1 to be used in another iteration step. Interleaving/deinterleaving stages Π1, Π1xe2x88x921 between the two decoders D1, D2 are incorporated to adapt the sequences to the order defined in the encoding step. This cycle of iterations will continue until certain conditions are met, such as a certain number of iterations are performed. The resulting extrinsic information is then no more relevant and the process may stop. The result is the decoded information bitstream U.
Turbo codes have rapidly received a lot of attention, and have been the focus for research since their first publication. Indeed, a gain of 3 dB over conventional coding schemes can be translated into a doubling of battery time, or a gain of 20% in bandwidth efficiency. Knowing the value of these resources, the enormous interest in turbo coding is very evident. As a consequence of their near to optimal performance, turbo coding schemes are now one of the main candidates for upcoming systems such as Universal Mobile Telecommunications Systems (UMTS), satellite UMTS and Digital Video Broadcasting (DVB), as described in 3rd Generation Partnership Project (3GPP), Technical Specification Group (TSG), Radio Access Network (RAN), Working Group1, xe2x80x9cMultiplexing and channel codingxe2x80x9d, TS 25.222 V1.0.0 Technical Specification, 1999-04. The acceptance of turbo coding has been spectacular, e.g. as evidenced by the number of publications and theoretical developments, as shown during the 2nd International Symposium on Turbo Codes and Related Topics, September 2000, Brest, France. In contrast, the hardware implementation of the turbo codes is following this evolution only very slowly. Speed, latency, and most of all power consumption and significant technical problems in implementing the turbo coding principles. Ideally, speeds in the order of 100 Mbps should be achieved in order to meet the ever-growing speed demands. High-speed data services require high coding gains, making concatenated coding with iterative decoding (turbo coding) highly suitable. The performance advantage of turbo coding comes at the cost of increased digital processing complexity and decoding latency. The penalty in complexity (operations per bit) is typically an order of magnitude, if the turbo coding scheme is implemented in a straightforward way. The latency-bottleneck needs to be solved if high-speed, low power turbo coders for real-time applications are envisaged. Current commercially available turbo coding solutions, such as e.g. from Small World Communications, Payneham South, Australia, from sci-worx, Hannover, Germany or from Soft DSP, Seoul, Korea, do not match the speed and power requirements imposed by current high-end communication systems.
Recently, some components for high-speed turbo coders, appropriate for real-time wireless communication (i.e. with low power consumption and low latency) have been reported on in literature, such as e.g. in G. Masera, G. Piccinini, M. Ruo roch, M. Zamboni, xe2x80x9cVLSI architectures for Turbo codesxe2x80x9d, IEEE Transactions in VLSI Systems, 7(3):369-378, September 1999, in J. Dielissen et Al., xe2x80x9cPower-Efficient Application-Specific VLIW Processor for Turbo decodingxe2x80x9d, in ISSCC 2001, San Francisco February 2001, or in Hong, Waynem, and Stark, xe2x80x9cDesign and Implementation of a Low Complexity VLSI Turbo-Code Decoder Architecture for Low Energy Mobile Wireless Communicationsxe2x80x9d, Proceedings of ISLPED 99, 1999. These advanced turbo coders, described in these recent publications, almost all use xe2x80x98overlapping sliding windowsxe2x80x99 (OSW) in the decoding processes to increase speed and decrease power consumption. Even better architectures for turbo decoding at high speed, low power, and low latency, have been reported in A. Giulietti, M. Sturm, F. Maessen, B. Gyselinckx, L. van der Perre, xe2x80x9cA study on fast, low-power VLSI architectures for turbo codesxe2x80x9d, International Microelectronics Symposium and Packaging, September 2000, as well as U.S. patent application Ser. No. 09/507,545, entitled xe2x80x9cMethod and System Architectures for Turbo-Decodingxe2x80x9d. While solutions for optimizing the decoding processes are available, no attractive results for speeding up the interleaving and de-interleaving operations have been proposed.
As discussed before, turbo decoders, despite their performance close to the channel limits with reasonable decoding complexity, suffer from high latency due to the iterative decoding process, the recursion in the decoding algorithm and the interleaving/deinterleaving between decoding stages. The parallelisation of the MAP decoding algorithm helps to avoid these drawbacks. The use of Overlapping Sliding Windows (OSW) is reported to be a good means of parallelisation, transforming in space the recursion in the MAP algorithm. The OSW scheme requires the use of separate storage elements at the input/output of each window in order to maximize the throughput.
One aspect of the invention enables fast and easy subsequent interleaving and deinterleaving operations. This is of uttermost importance in channel coding systems for communications systems, featuring low latency, high speed, and low power.
In another aspect the invention enables the use of small memories for implementing an interleaving/deinterleaving operation.
Another aspect of the invention implements fast interleaving/deinterleaving operations.
The aspects of the invention are accomplished by a method of the type comprising executing a first process, thus producing a first output array; writing the first output array into a first memory structure, thereafter reading from the first memory structure an input array; executing a second process, consuming the input array and producing a second output array; writing the second output array into a second memory structure. The writing step to the first memory structure may be in a different order than the reading step from the first memory structure, such that the input array is a first permutation of the first output array. The writing step to the second memory structure may be in a different order than the writing to the first memory structure and is in a different order than the reading from the first memory structure, such that the second output array is a second permutation of the input array, the second permutation being the inverse of the first permutation. The memory structures are physical structures which comprise memory elements which may be binary elements, i.e. they can take one of two values. One value may represent a xe2x80x9c1xe2x80x9d and the other value the xe2x80x9c0xe2x80x9d in a binary numbering system. In the method the first memory structure and the second memory structure may be a single memory structure. The skilled person will appreciate that there are available means to control and drive memories, including means for generating addresses and means for selecting an address. The memory structures may comprise means for storing at least an array of the size of the input and output arrays and may comprise separate sub-memories, each of the separate sub-memories comprising means for storing at least parts of the input arrays and output arrays. Each of the separate sub-memories may comprise means for storing at most an array of the size of the input array and output arrays divided by Nxe2x88x921, N being the amount of different sub-memories in a memory structure.
In the method at least one of the first and second processes comprises subprocesses, each sub-process consuming and producing respectively part of the related input and output array. The sub-processes of a process may be executed substantially simultaneously.
The writing of the parts of the output array may be carried out substantially simultaneously, each part being produced by one of the sub-processes of the first or second process. The reading of the parts of the input array may be carried out substantially simultaneously, each part being consumed by one of the sub-processes of the first or second process.
Another embodiment of the invention comprises an apparatus having a first computing device being capable of executing a first process, and producing a first output array; a first memory structure, wherein the first output array can be written; a second computing device being capable of executing a second process, consuming an input array read from the first memory structure; a second memory structure wherein the second output array can be written; means for writing to the first memory structure in a different order than the reading from the first memory structure, such that the input array is a first permutation of the first output array; and means for writing to the second memory structure in a different order than the writing to the first memory structure and in a different order than the reading from the first memory structure, such that the second output array is a second permutation of the input array, the second permutation being the inverse of the first permutation. The first and second memory structure may be a single memory structure. The memory structures are physical structures, which comprise memory elements which may be binary elements, i.e. they can take one of two values. One value may represent a xe2x80x9c1xe2x80x9d and the other value the xe2x80x9c0xe2x80x9d in a binary numbering system. The memory structures may comprise a plurality of sub-memories. The computing devices may comprise means for substantially simultaneously executing a plurality of sub-processes, the sub-processes together defining the first and second process. The first and second computing devices may be a single computing device.
Addresses for writing data elements to or for reading data elements from the submemories may be determined in accordance with an algorithm which gives the same result as the following method: an output order of matrix elements of a matrix is determined by reading serial elements into the matrix according to a first direction of the matrix followed by reading the elements from the matrix in a second direction of the matrix, the second direction being different from the first direction and the order of reading out of the matrix elements determines the storage locations of the data elements in the sub-memories. Dimensions of the matrix are selected such that the writing and reading results in a collision-free reading or writing to the sub-memories.
The dimensions of the matrix may be selected such that one of the dimensions does not divide the other of the dimensions. A shifting operation may be performed on the matrix elements read out of the matrix in the first or second directions.
Another embodiment of the invention includes an apparatus for performing iterative decoding on a serial data stream, comprising a plurality of memories, a plurality of decoders, each decoder at least partially decoding a portion of the serial data stream in parallel with the remaining decoders, an address generation circuit for generating addresses for the plurality of memories, a first data router for routing data from the plurality of memories to the plurality of decoders, and a second data router for routing data from the plurality of decoders to the plurality of memories.
The first and second data routers, the plurality of memories and decoders and the address generator may co-operate to provide a first decoder for executing a first process, thus producing a first output array, means for writing the first output array into a first memory, means for reading from the first memory an input array, a second decoder for executing a second process, consuming the input array and producing a second output array, means for writing the second output array into a second memory; such that the input array is a first permutation of the first output array; and such that the second output array is a second permutation of the input array, the second permutation being the inverse of the first permutation.
The address generator may have means for generating addresses for writing data elements to or for reading data elements from the memories in accordance with an algorithm which gives the same result as the following method: an output order of matrix elements of a matrix is determined by reading serial elements into the matrix according to a first direction of the matrix followed by reading the elements from the matrix in a second direction of the matrix, the second direction being different from the first direction and the order of reading out of the matrix elements determines the storage locations of the data elements in the sub-memories.
Another embodiment of the invention concerns methods wherein subsequent permutation and inverse permutation operations, in order to provide inputs in correct order for first and second processes are needed. Processes needing inputs in original order and processes needing inputs in permutated order can be distinguished, thereby using one of the processes as reference process. These different processes can be performed or executed by the same actual hardware.
An embodiment of the invention introduces permutation and inverse permutation operations, and thus fits in a turbo coding system and in systems applying the turbo coding principle. According to one embodiment of the invention, at least one permutation and one inverse permutation operation are performed subsequently.
An aspect of the invention comprises the alternation of permutation and inverse permutation operations, by scheduling linear writing and reading operations and permutated or inverse permutated writing and reading operations.
An embodiment of the invention enables parallel execution of sub-processes. In the parallel methods, the processes producing and consuming data can be performed in a parallel way. Also in the parallel methods, the writing to and the reading operations from a memory can be performed in a parallel way. With a process producing data in a parallel way is meant that several data are produced at the same time, by different sub-processes of the process. With a process consuming data in a parallel way is meant that several data are consumed at the same time, by different sub-processes of the process.
These and other features and advantages of the invention will become apparent from the following detailed description, taken in conjunction with the accompanying drawings, which illustrate, by way of example, the principles of the invention. The detailed description is given for the sake of example only, without limiting the scope of the invention. The reference figures quoted below refer to the attached drawings.