Publications and documents referred to throughout this description will be indicated simply by a reference numeral between square brackets (e.g. [X]). A full list of those publications and documents, with the respective reference numerals, is reproduced at the end of this description under the heading “List Of References”.
Modern communication systems often require channel codes providing high reliability and decoding schemes suitable to support high data rates. Low-Density Parity-Check (LDPC) codes are among the best candidates to meet these requirements, because they offer near-Shannon limit performance along with an iterative decoding algorithm that allows a high degree of parallelism of the processing, thus favoring the design of high throughput architectures of the related decoder.
However, routing congestion and memory collision might limit a practical exploitation of the inherent parallelism of the decoding algorithm. In order to solve this problem, upcoming standards featuring LDPC codes such as IEEE 802.11n (WiFi) [1], IEEE802.16e (WiMax) [2] and DVB-S2 [3], adopt joint code-decoder design techniques [4]. According to these approaches, the codes are designed with a block structure (having blocks of size P) that naturally fits with the vectorization of the decoder architecture, thus guaranteeing a collision-free parallelism of P.
Joint code-decoder design techniques and the possibility of vectorizing the decoder architecture permit a reduction in the iteration latency, because P processing units work in parallel. Consequently, higher throughputs can be achieved with decoder architectures using more than P processing units in parallel, but, because of the memory collision problem, the complexity overhead (or even the latency overhead brought in by those processing that cannot be done in parallel) can be significant.
Another technique to reduce decoding latency is to consider schedules capable of increasing the convergence speed of the algorithm. In this way, fewer iterations are needed to reach a reference communication reliability, thus increasing the achieved throughput.
Specifically, in [6], a decoder architecture is disclosed which performs a layered decoding of LDPC codes. Changing the sequence of the block-rows is proposed in order to improve error correction capability and convergence speed of the decoder. Presumably, the optimization of the sequence of layers is assessed through off-line computer simulations.
Document [7] discloses an architecture for a layered decoder and considers the problem of using the same Soft-Output estimates by more than one processing unit; to this extent, reordering of the layers in the codes is proposed: specifically, the number of Soft-Output (SO) items in common between consecutive layers is minimized. An approximated method is disclosed for this purpose, which is able to use two independent metrics for the SO in common to consecutive layers, based on redundant memory requirements. However, no optimization algorithm (i.e. by means of a cost function) is proposed to reorder the sequence of layers.
Also document [8] discloses the possibility of inter-layer permutation. Additionally, the possibility of intra-layer reordering is also mentioned in order to improve performance in terms of error correction capability and convergence speed. Emphasis is placed on an approximated architecture for layered decoding of LDPC codes, which is based on the incremental updates on the Soft-Output items processed in parallel by more than one processing unit. Specifically, the number of occurrences of this event (multiple update of a same metric) is minimized in order not to degrade the decoding performance. This is achieved by acting on the number and the way in which the rows of the parity-check matrix are grouped together in a vectorized architecture. Particularly, in grouping rows, care is devoted to minimizing the occurrences of Soft-Output in common to more than one row in the layer. However, multiple updates of a same metric (SO) can result in conflicts during memory access and no strategy is described to optimize the system. The number of conflicts is simply derived (counted) from off-line computer simulation.
Finally, document [9] discloses an architecture for a layered decoder and modifications are introduced in the basic formulation to support increased flexibility in terms of size of the data vector to be processed in parallel. The possibility of changing the order of the output data computed by the serial check-node processor is considered. However, no attempt is made to optimize decoder performance. Specifically, updated messages relating to an earlier layer are output according to the same column-order of the messages currently acquired (and belonging to the next layer to be processed). This approach is based on the presence of storage memory on board of the serial processor itself.