Modern communication systems often deploy multilayer forward error correction schemes to increase their performance. A typical scheme is to layer a convolutional code on top of a block code. However, the implementation of these coding schemes can be very computationally expensive and introduce substantial delays for encoding and decoding. The delays will reduce the effective data rate attainable on a channel and thus need to be minimized. One method to accelerate the computation is to add specialized hardware at the expense of adding substantial cost to the system. Since these communication systems typically already have at least one general-purpose processor that is required to implement and control the modem, it would be advantageous to use this existing processor to also perform the error correction. A common choice for the block code is a non-binary BCH code called the Reed-Solomon code. This code requires special mathematical operations using Galois math (additions and multiplications) operating on a binary extension field. These special multiply operations are not built into standard processors (even if standard multiplication is built in), and to implement these multiplies can require too many processor cycles to be feasible given the desire to minimize coding delays in the system.