The present invention relates to an apparatus and method thereof of decoding data, in general, and in particular, and method and apparatus for decoding Enhanced Turbo Product Codes in an efficient Turbo Product Code Decoder System.
When transmitting data using non-binary lower and higher order modulation, a binary turbo product code encoder and decoder is used, along with Gray code mapping and log-likelihood ratio (LLR) computation. This scheme is often called pragmatic coding because it avoids the complex task of constructing a forward error correction code that matches the given channel requirement. Some prior art coding techniques, such as Ungerboeck Trellis Coded Modulation (TCM), require the construction of convolutional codes that are built based on the desired constellation. Such a code could be built, for example, to match an 8-PSK, or phase shift key, modulation. However, the code must be redesigned if the modulation is changed from 8-PSK to 16-PSK, or 16-QAM, known as Quadrature Amplitude Modulation. This makes practical use of such a coding scheme difficult. Other schemes have been developed for block codes such as Block Coded Modulation, but these also suffer the same code redesign issue.
A pragmatic TCM approach was discovered which alleviated these complex design issues by using a standard binary convolutional code mapped to a higher order modulation system. This approach has also been applied to block codes and to Turbo Product Codes (TPCs). A simple Gray code map is used to map the binary bits output from a TPC encoder to a signal constellation. For example, if 16-QAM is chosen as the modulation type, then bits output from the encoder are grouped into words having 4 bits each.
In order to get optimum performance from a TPC decoder, soft decision information is generated from the channel. This is accomplished by computing the log-likelihood ratio (LLR) which gives a confidence (soft decision) value for each bit in each 4 bit word. The optimal LLR is very complex to compute, as it requires the computation of logarithms, Euclidean distance, and exponentials. The general method used in prior art decoders is to pre-compute the value of the LLR for each possible received channel value. The resulting data is then stored in a ROM or other storage medium, and the LLR is calculated using a table lookup from the storage medium. The problems with this method of computation is that it requires a different lookup table for each modulation format that is supported. In addition, the size of the lookup tables becomes very large for very high order modulations, thus requiring large storage mediums.
What is needed is an LLR approximation method and apparatus which takes an expression with a natural logarithm and exponentials and reduces it to a set of linear equations. In addition, what is needed is that the LLR approximation method be simple enough to be implemented in hardware and also be able to determine soft-input values without using a lookup table.
Previous methods of locating synchronization patterns in data being input were to scan the data stream as it passed a point and then start a counter when a synchronization mark was found to indicate when the next mark would be expected. The problems with this method is whenever a false synchronization mark is found, all other synchronization marks are ignored until it is determined that the synchronization mark was in fact false. Whether the mark is false or not is determined by not finding another mark at the expected location.
This problem can be addressed by using larger synchronization marks. However larger marks cause higher overhead for the synchronization modules. In addition, these solutions that increase the size of a synchronization mark suffer in a noisy environment. Another possibility is scanning the datastream at two or more locations so that two or more synchronization marks can be expected at the same time. This is the same as multiplying the length of the synchronization mark by the number of marks that are observed. This is undesirable because all data between the observed points is buffered in RAM and thus takes up space in the RAM. As the length of the synchronization mark increases, the probability that one or more bits in the synchronization mark are incorrect increases.
Thus, what is needed is a method and apparatus that scans the data stream for synchronization marks and uses only one observation point. What is also needed is that the method and apparatus that scans input bit stream by searching for periodic synchronization marks, and when synchronized, the output data stream is bit and block aligned.
Prior art iterative decoders use a single microprocessor to execute the steps required to decode data entering the system. These decoders are relatively slow, because the data is stored in the system's memory. Hardware implementations of turbo decoders generally use a serial concatenation of SISO decoders to achieve faster decoding speeds, with each SISO performing one iteration and passing the data to succeeding SISOs to do later iterations. Such decoders increase the latency of the system and also require more logic to implement.
Some prior art decoders utilize parallel processing to achieve higher data throughput rates. These types of decoders store data with four codeword bits per RAM location. The data is then accessed and sent directly to four parallel SISO decoders, where each decoder can input only one codeword bit per clock cycle. These decoders have a data throughput that is 4 times more than decoders using only one SISO. Thus, the processing power grows linearly with the parallel SISOs. For example, if a decoder uses 8 SISOs instead of 4, it will operate at roughly twice the speed. If a decoder operating at 100 Mbit/sec or even 1 Gbit/sec is required, this method of decoding will become too complex to build. Further, prior art decoders cannot support Enhanced TPCs (ETPCs), which are codes that include constituent coding, such as extending Hamming Codes and/or parity codes along with hyper diagonal parity. Also, prior art SISO decoders input generally one codeword bit per clock cycle. So, the SISO executes the decoding steps as the data is received and after the entire codeword is input into the SISO. The SISO then outputs the result one codeword bit per clock cycle.
Instead, what is needed is a SISO decoder that can process multiple codeword bits per clock cycle. Therefore, what is needed is a decoding method and apparatus that can process data in parallel and scale to higher decoding throughput rates. What is also needed is that the method and apparatus support scalable decoding as well as able to decode ETPCs. What is also needed is a RAM organization method in the apparatus which results in low complexity, high data throughput RAM access.
Prior art decoders find codewords nearby the center codeword. The prior art decoders utilize a search algorithm that requires a used bit location register, syndrome calculations, and error lookup tables to find the nearby codewords. Using these algorithms and registers, the decoder requires a significant amount of hardware. This hardware includes large syndrome generating circuits that are slow due to the significant amount of syndrome calculations. In addition, used bit location registers and lookup tables are required which add to the amount of hardware. What is needed is a method and apparatus to calculate nearest neighbor codewords in reduced search set. What is also needed is that the method and apparatus simplify the nearest neighbor search and reduce the codeword search by using much less logic than that of the prior art.
The number of iterations required to correct a block of data varies from block to block. This phenomenon occurs even when the channel conditions have white Gaussian noise. The location and number of errors created by the channel can change the rate at which the decoder converges. FIG. 1 shows a probability density function of the iterations. The x-axis of FIG. 1 shows the number of iterations ranging from 1 to 30. The y-axis shows the probability of a given block requiring that number of iterations. As can be seen, there is a long tail extending out to 20 iterations. In fact, for this set of blocks, the maximum number of iterations required is 26.
When an iterative decoder is required to run at a maximum number of iterations, all blocks of data that do not converge are output from the decoder with errors. This causes results in poor bit error rate performance, because the decoder is not allowed to iterate longer on the blocks of data to correct these errors. The prior art decoder has the ability to stop iterating once it converges on the block of data. However, the decoder will have problems converging on a block of data which enters as a continuous stream. In other words, it is very difficult to stop the transmission of data when the decoder requires a larger number of iterations to converge.
What is needed is a decoder that is able to determine when it has converged on a codeword. What is also needed is a decoder which iterates more for more difficult blocks and iterates less for less difficult blocks. What is also needed is a decoder that can converge on blocks of data that are input into the decoder in a continuous stream. It is also desired that the decoder utilize a design that allows it to run a variable number of iterations.