The transmission of information (data, images, speech, etc) increasingly relies on digital transmission techniques. A great deal of effort has been made in source encoding to reduce the digital bit rate and, at the same time, to preserve high quality. These techniques naturally require improved protection of the bits against transmission-related disturbance. The use of powerful error-correction codes in these transmission systems has proved to be indispensable. It is especially for this purpose that the technique of “turbo-codes” has been proposed.
The general principle of “turbo-codes” is presented especially in the French patent No FR-91 05280, entitled “Procédé de codage correcteur d'erreurs à au moins deux codages convolutifs systématiques parallèles” (“Method of error correction encoding with at least two parallel systematic convolutive encoding operations”, and in C. Berrou, A. Glavieux and P. Thitimajshima “Near Shannon limit error-correcting coding and decoding: Turbo-codes” in IEEE International Conference on Communication, ICC'93, vol2/3, pages 1064 to 1071, May 1993. A prior art technique is recalled in C. Berrou and A. Glavieux “Near Optimum Error Correcting Coding and Decoding: Turbo-Codes” (IEEE Transactions on Communications, Vol. 44, No. 10, pages 1261–1271, October 1996).
This technique proposes the implementation of “parallel concatenation” encoding, which relies on the use of at least two elementary decoders. This makes available two redundancy symbols, coming from two distinct encoders. Between the two elementary encoders, permutation means are implemented so that each of these elementary encoders is supplied with source digital data which is the same data but taken in a different order each time.
A complement to this type of technique is used to obtain codes known as “block turbo-codes” or BTCs. This complementary technique is designed for block encoding (concatenated codes). This improved technique is described in R. Pyndiah, A. Glavieux, A. Picart and S. Jacq in “Near optimum decoding of product code” (in IEEE Transactions on Communications, volume 46, No 8 pages 1003 to 1010 August 1998), in the patent FR-93 13858, “Procédé pour transmettre des bits d'information en appliquant des codes en blocs concaténés” (Method for the Transmission of Information Bits by the Application of Concatenated Block Codes) and in O. Aitsab and R. Pyndiah “Performance of Reed Solomon Block Turbo-Code” (IEEE Globecom'96 Conference, Vol. 1/3, pages 121–125, London, November 1996).
This technique relies especially on the use of product codes introduced by P. Elias and described in his article “Error-Free Coding” in “IRE Transaction on Information Theory” (Vol. IT4, pages 29–27) September 1954. The product codes are based on the serial concatenation of block codes. The product codes have long been decoded according to hard-input and hard-output algorithms in which an elementary block code decoder accepts bits at input and gives them at output.
To decode block “turbo-codes”, it is envisaged to use soft-input and soft-output decoding means in which an elementary block code decoder accepts bits, weighted as a function of their likelihood, at input and gives these bits at output.
Block “turbo-codes” are particularly attractive when data encoding is applied to small-sized blocks (for example blocks smaller than 100 bits) or when the efficiency of the code (that is, the number of useful data bits divided by the number of encoded data bits, for example, 0.95) is high and the error rate desired is low. Indeed, the performance level of the code, generally measured in terms of residual error rate as a function of a given signal-to-noise ratio, varies as a function of the minimum Hamming distance of the code which is very high in the case of block “turbo-codes” (9, 16, 24, 36 or more).
It is recalled first of all that a serial concatenated code can generally be represented in the form of a binary matrix [C] with a dimension 2 as illustrated in FIG. 1. This matrix [C] contains n1 rows and n2 columns and:                the binary information samples are represented by a sub-matrix 10, [M], with k1 rows and k2 columns;        each of the k1 rows of the matrix [M] is encoded by an elementary code C2(n2, k2, δ2) (the redundancy is represented by a row redundancy sub-matrix 11);        each of the k2 columns of the matrix [M] and of the row redundancy is encoded by an elementary code C1 (n1 k1, δ1) (the redundancy corresponding to the binary information samples is represented by a column redundancy sub-matrix 12; the redundancy corresponding to the row redundancy of the sub-matrix 11 is represented by a redundancy of redundancy sub-matrix 13).        
If the code C1 is linear, the (n1–k1) rows built by C1 are words of the code C2 and may therefore be decoded as the k1 first rows. A series concatenated code is characterized by n1 code words of C2 along the rows and by n2 code words of C1 along the columns. The codes C1 and C2 may be obtained from convolutive elementary codes used as block codes or linear block codes.
It is recalled that a parallel concatenated code can generally be represented in the form of a binary matrix [C] with a dimension 2 as illustrated in FIG. 1. This matrix [C] contains n1 rows and n2 columns and:                the binary information samples are represented by a sub-matrix 10, [M], with k1 rows and k2 columns;        each of the k1 rows of the matrix [M] is encoded by an elementary code C2(n2, k2, δ2) (the redundancy is represented by a row redundancy sub-matrix 11);        each of the k2 columns of the matrix [M] is encoded by an elementary code C1 (n1, k1, δ1) (the redundancy corresponding to the binary information samples is represented by a column redundancy sub-matrix 12; there is no redundancy of redundancy in the case of parallel concatenated codes).        
The different techniques of “turbo-decoding” are increasingly valuable for digital communications systems which require ever greater reliability. Furthermore, the transmission rates are increasingly high. The use of transmission channels on optical fibers is making it possible, in particular, to attain bit rates in the gigabit and even the terabit range.
The “turbo-decoding” of a code corresponding to the matrix C of FIG. 1 consists in carrying out a weighted-input and weighted-output decoding on all the rows and then all the columns of the matrix C, according to the iterative process illustrated in FIG. 2.
After reception 21 of the data to be processed, a pre-determined number (Nb_Iter_Max) of the following operations is performed:                the decoding 22 of the columns (one half-iteration);        the reconstruction 23 of the matrix;        the decoding 24 of the rows (one half-iteration);        the reconstruction 25 of the matrix.        
These operations are therefore repeated so long as the number i of iterations, incremented (26) at each iteration, is smaller than Nb_Iter_Max (27), the number i having been initialized beforehand at zero (28).
The decoded data, referenced Dk, are then processed (29).
In general, the information exchanged from one half-iteration 22, 25 to another are defined by FIG. 3.
Rk corresponds to the information received from the channel, R′k corresponds to the information coming from the prior half-iteration and R′k+ corresponds to the information sent at the next half-iteration. The output of each half-iteration is therefore equal to the sum 36 of Rk and of the extrinsic information, Wk, then multiplied (31) by a feedback or convergence coefficient alpha. This extrinsic information corresponds to the contribution of the decoder 32. It is obtained by taking the difference 33 between the weighted output Fk of the decoder and the weighted input of this same decoder.
Time delays 34 and 35 are planned to compensate for the latency of the decoder 32.
Hereinafter, the weighted-input and weighted-output decoder will be considered to be a block having Rk and R′k (sampled on q bits) as inputs, delivering R′k+ et Rk+ (sampled on q bits) at the output with a certain latency L (the delay necessary to implement the decoding algorithm). It is called a Processing Unit (PU) 30.
The decoder 32 furthermore gives a binary decision Dk used during the last half-iteration of a <<turbo-decoding>> operation, which corresponds to a decoded data element sent out during the operation 29 illustrated in FIG. 2.
If we consider another sub-division of the block diagram of FIG. 3, R′k may be replaced by the extrinsic information Wk which becomes the input-output of the processing unit 40. R′k which is still used as an input of the decoder 32 is then an internal variable. This variant is illustrated by FIG. 4.
In the prior art, there are two different types of known types of decoder architecture for block “turbo-codes” based on:                a modular structure; or        a Von Neumann structure        
In the modular structure, modules or elementary decoders are cascaded, each of these modules being responsible for a half-iteration. This processing is well suited to decoding weighted-input and weighted-output algorithms inasmuch as many functions in these algorithms are classically carried out in sequence and are then simple to implant.
A major drawback of this prior art technique is that it introduces high latency into data processing, the latency being the number of samples that comes out of the decoder before a piece of data present at input is located, in its turn, at output. This latency increases with the number of modules. Furthermore, space requirement of the circuit is itself also relatively great and increases with the number of modules. The latency and space requirements parameters of the circuit constitute an essential defect when the number of iterations and/or the length of the code increase.
In the Von Neumann structure, the circuit carries out several iterations by using a single storage unit and a single processing unit for all the iterations. An elementary decoding module is looped back on itself. With this architecture, the number of memories necessary is reduced. The gain in storage circuit surface area is considerable since the storage surface area is independent of the number of iterations. Nevertheless, a major drawback of this structure is that it leads to a reduction in the data throughput rate.
Thus, as already mentioned, a functional analysis of the <<turbo-decoding>> algorithm was used to identify two possible architectures for a product code <<turbo-decoder>> circuit (one architecture being modular and the other one being likened to a machine known as a Von Neumann machine). These two structures are now described with some greater precision.
a) Modular Structure
From the operating scheme of the algorithm, a modular structure may be imagined for the <<turbo-decoder>> in which each sub-circuit carries out a decoding half-iteration (i.e. a decoding of the rows and columns of a data matrix [R] and [W] or [R′]). It is necessary to memorize [R] and [W] (or [R′], depending on the block diagram of the chosen processing unit 30 or 40).
The complete circuit is then constituted by cascaded, identical modules as shown in FIG. 5. For four iterations for example, the circuit uses eight modules, or elementary decoders.
With the modular architecture, the data are processed sequentially (sample after sample). This processing is well suited to the weighted-input and weighted-output decoding algorithms inasmuch as many functions in these algorithms are classically performed in sequence and are then simple to implant.
Each module introduces a latency of (n1n2+L) samples. The latency is the number of samples coming out of the decoder before a piece of data present at input is located, in its turn, at output. In this expression, the n1n2 first samples correspond to the filling of a data matrix and the L next samples correspond to the decoding proper of a row (or column) of this matrix.
b) Von Neumann Structure
The second architecture can be likened to a Von Neumann sequential machine. It uses one and the same processing unit to carry out several iterations. In comparison with the previous solution, this one is aimed chiefly at reducing the space requirement of the <<turbo-decoder>>. It furthermore has the advantage of limiting the overall latency introduced by the circuit, independently of the number of iterations performed, to 2.n1n2 samples at the maximum (n1n2 to fill a matrix and n1n2 additional samples for the decoding).
Each sample is processed sequentially and must be decoded in a time that does not exceed the inverse of the product of the data throughput rate multiplied by the number of half-iterations to be performed. Thus, for four iterations, the data throughput rate can only be at least eight times lower than the data processing rate. This means that, between the modular architecture and the Von Neumann architecture, the maximum data throughput rate is divided by a factor at least equal to the number of half-iterations used. The latency is lower for the Von Neumann structure (2 n1n2 samples at the maximum as against (n1n2+L).it in the other, it being the number of half-iterations) but the data throughput rate is lower for a same data processing speed.
The maximum number of iterations that can be integrated into the circuit is limited by the bit rate to be attained and by the maximum frequency of operation authorized by the technology used.
The memory aspects shall now be described with reference to these two structures. In any case, the space requirement of the circuit essentially arises out of the size and number of the memories used. Independently of the general architecture chosen, it is indeed indispensable to memorize the matrices [R] and [W] (or [R′]) for the entire duration of the half-iteration in progress (a half-iteration corresponds to a decoding of the rows or columns of a data matrix). The processing of the data in rows and then in columns makes it necessary to provide for a first memory to receive the data and a second memory to process the data. These two memories work alternatively in write and read mode, with an automaton managing the sequencing. Each memory is organized in a matrix and, for a code with a length n1n2 and a quantification of the data on q bits, it is formed by memory arrays of q.n1n2 bits each.
a) Modular Structure
In the case of a modular structure, the general organization of the circuit on a half-iteration is that of FIGS. 5 and 6.
The module 50 illustrated in FIG. 5 contains a processing unit 40 (as illustrated in FIG. 4) and four memories:                a storage memory 51 containing the data [R];        a processing memory 52 containing the data [R];        a storage memory 53 containing the data [W](or [R′] depending on the processing unit); and        a processing memory 54 containing the data [W](or [R′]).        
The data [R] 571 (and [W] 572 respectively) encoded on q bits which reach the storage module 50 are arranged along the rows of the reception memory 51 (and 53 respectively) working in write mode, the logic switch 551 (and 553 respectively) at input of the memory 51 (and 53 respectively) (implemented, for example in the form of an addressing bit enabling the selection of the memory 51 (and 53 respectively) during a write operation) being then closed and the switch 561 (and 563 respectively) at input of the memory 52 (and 54 respectively) being open. The data [R] at input of the first module come directly from the transmission channel while the data [R] of each of the following modules come from the output [R] 591 of the previous module. The data [W] at input of the first module are zeros while the data [W] of each of the next modules come from the output [W] 592 of the previous module.
The data of the matrix received previously are read out along the columns of the processing memories-52 and 54 which, for its part, works in read mode, the logic switch 562 (and 554 respectively) at output of the memory 52 (and 54 respectively) (implemented, for example in the form of an addressing bit enabling the selection of the memory 52 (and 54 respectively) during a read operation) being then closed and the switch 562 (and 564 respectively) at output of the memory 51 (and 53 respectively) being open.
Once the reception memories are filled, the processing memories go into write mode (in other words, the roles of the memories 51 and 52 (53 and 54 respectively) are exchanged, and the logic switches 551, 552, 551, and 562 (and 553, 554, 563 and 564 respectively) “change position”) in order to store the data corresponding to the next code word. By cascading two modules, one for the decoding of the columns and the other for the decoding of the rows of an encoded matrix, a full iteration is performed.
The memories 51, 52, 53 and 54 may be designed without difficulty from typical row/column-addressable single-port RAMs (Random Access Memories). Other approaches (for example using shift registers) may be envisaged, but they take up more space.
It is noted that the data exchanged on the data bus as illustrated in FIG. 5 are encoded on q bits while, in a variant illustrated in FIG. 6, the data are encoded on 2.q bits, each of the data then containing q bits corresponding to a piece of data [R] and q bits corresponding to a piece of data [W] (or [R′]).
The module 60 illustrated in FIG. 6 makes it possible to perform a decoding half-iteration and contains a processing unit 40 (as illustrated with reference to FIG. 4) and two memories:                a storage or reception memory 62 containing the data [R] and [W](or [R′] if the processing unit is like the unit 30 illustrated in FIG. 3); and        a processing memory 63 containing the data [R] and [W] (or [R′]).        
The data 61 encoded on 2.q bits which arrive at the decoding module are arranged in order along the rows of the reception memory 62 working in write mode. In parallel, the data of the matrix received earlier are picked up along the columns of the processing memory 62, which itself works in read mode. Once the reception memory 62 is filled, the processing memory goes into write mode in order to store the data corresponding to the next code word. By cascading two modules, one for the decoding of the columns and the other for the decoding of the rows of an encoded matrix, a full iteration is performed.
The memories 62, 63 may be designed without difficulty from typical row/column-addressable single-port RAMs (Random Access Memories). Other approaches (for example using shift registers) may be envisaged, but they take up more space.
From a practical point of view, the modular approach has the advantage of enabling high operating frequency and of being very flexible in its use. As a trade-off, the cascade-connection of several modules leads to an increase in the latency and the amount of space taken up by the circuit. These parameters soon constitute an essential defect when there is an increase in the number of iterations and/or the length of the code.
b) The Structure known as the Von Neumann Structure
This time, the circuit carries out several iterations in using four storage units 70, 71, 72 and 73 illustrated in FIG. 7. The decoding module is looped back to itself. With this architecture, the full circuit has only four memories 70, 71, 72 and 73, independently of the number of iterations performed. However, these memories 70, 71, 72 and 73 should be capable of being read and written by row/column addresses.
The memories 70, 71, 72 and 73 are typical single-port RAMs in which it is possible to read or write a piece of data identified by its address. Since each sample is accessed directly, the matrix can be decoded along either its rows or its columns. The memories are similar to those chosen for the modular solution. However, since the full circuit has only four of them, the gain in surface area is considerable (80% for four iterations). It must be noted however that this reduction in surface area is obtained, for a same speed of operation of the circuits, to the detriment of the data throughput rate (divided by at least it for it/2 iterations: it is indeed necessary, in this computation of the latency, to take account of each elementary decoding).
The data [R] 76 (and [W] 75 respectively) encoded on q bits are arranged in order along the rows of the reception memory 70 (and 72 respectively) working in write mode, the logic router 77, (and 78, respectively) routing the data towards the memory 70 (and 72 respectively) (implemented, for example, in the form of an addressing bit enabling the selection of the memory 70 (and 72 respectively) during a write operation). The data [R] 76 at input directly come from the transmission channel. The data [W] at input are zeros during the first half-iteration while the data [W] of each of the following half-iterations come from the output [W] 75 of the previous half-iteration.
In parallel, the data [R] received earlier are picked up along the columns of the processing memory 71 which, for its part, works in read mode. The logic router 772 at output of the memories 71 and 70 (implemented, for example, in the form of an addressing bit) enables the selection of the memory 71 during a read operation. In parallel, the data [W] coming from a previous half-iteration (or zeros if it is a first half-iteration) are picked up along the columns of the processor memory 73, which for its part works in read mode. The logic router 782 at output of the memories 72 and 73 enables the selection of the memory 72 during a read operation.
Once the reception memory of [W] is filled (i.e. at the end of each operation of turbo-decoding of a block if it is assumed that the data are transmitted continuously) the roles of the processing and reception memories [W] are exchanged: the processing memory of [W] goes into write mode and becomes a reception memory (in other words, the logic routers 781 and 782 “change position” in order to store the data corresponding to the following code word and the reception memory of [W] goes into read mode and becomes a processing memory.
Once the reception memory of [R] is filled (i.e. at the end of each operation of turbo-decoding of a block if it is assumed that the data are transmitted continuously) the roles of the processing and reception memories of [R] are exchanged: the processing memory of [R] goes into write mode and becomes a reception memory (in other words, the logic routers 771 and 772 “change position” in order to store the data corresponding to the following code word and the reception memory of [R] goes into read mode and becomes a processing memory. If, as a variant, the data are transmitted in packet (or burst) mode, and if each packet is to be decoded only once, the decoding being completed before the arrival of a new packet, it is not necessary, in a Von Neumann structure, to have two processing and reception memories respectively for the data [R] but only one is enough.
The memories 70, 71, 72 and 73 used may be designed without difficulty from classic, row-addressable and column-addressable, single-port RAMs (Random Access Memories). Other approaches (for example using shift registers) may be envisaged, but they take up more space.
It may be noted that the data exchanged on the data bus, as illustrated in FIG. 7, are encoded on q bits.
It may be noted that, as a variant to the embodiments illustrated in FIGS. 5, 6 and 7, a processing unit 30 as illustrated in FIG. 3 may replace the processing unit 40. The [W] type data are then replaced by the [R′] type data in the memories.
According to the prior art, a high-throughput-rate architecture duplicates the number of modules illustrated in FIG. 6 or 7.
The invention according to its different aspects is designed especially to overcome these drawbacks of the prior art.
More specifically, it is a goal of the invention to provide a decoding module, method and device adapted to providing high performance in terms of error rate while, at the same time, limiting the surface area of the circuits needed for the processing operations (elementary decoding) and the memories.
It is another goal of the invention to provide a decoding module, method and device capable of processing high throughput rates for a given clock frequency of operation.
It is also a goal of the invention to reduce the decoding latency in a decoding module, method and device of this kind.