Expressed Sequence Tags (ESTs) are sequenced from mRNA in a known direction, from a random start point, for a length of approximately 100 to 400 bases. The sequencing reliability is variable across the length of the EST but generally in excess of 95% accuracy. The cDNA encoding an EST comes from a single clone and there are often several ESTs generated. The goal of the assembly process is to collect all ESTs derived from the clones encoding a particular gene and combine them into a multiple sequence alignment. A consensus of that sequence is generated and this is taken to be the cDNA sequence of the underlying gene which allows further analysis on its relationship to known proteins, many of which only become apparent at the amino acid level. The sequence errors inherent in the EST generation process, namely miscalls, unknowns, insertions and deletions lead to inconsistencies in the multiple sequence alignment. It is possible for these errors to be propagated into the assembly consensus where they occur in several ESTs or there is low coverage, i.e., few ESTs align over the corresponding part of the multiple sequence alignment. Miscalls and unknowns may lead to the insertion of a stop codon in the correct reading frame. Insertions and deletions will lead to a frameshift. These events make the prediction of the correct open reading frame very difficult. Inconsistencies in a multiple sequence alignment are generally resolved by a majority vote. Where there is disagreement between EST components of an assembly about what base to enter at a particular site, the base that occurs most frequently at that point in all the ESTs is chosen. Where there is no clear majority some heuristic must be applied to determine the "best" consensus. In many cases, this heuristic is largely frequentistic, order-dependent or ad hoc and ignores the fact that the consensus cDNA sequence should code for a protein sequence. Since the aim of EST assembly is to improve the potential quality of the cDNA sequence, a need exists for heuristics that consider the coding potential of the assembly consensus when choosing between alternative alignments.