Sequencing reactions of next generation sequencers often take place on amplified templates randomly arrayed on a surface of a solid support or thin gel layer, e.g. Bentley et al, Nature, 456:53-59 (2008; Kim et al, Science, 316; 1481-1414 (2007); or the like. As the density of such amplified sequences (or equivalently “amplicons” or “clusters”) becomes higher, the frequency of contiguous and overlapping amplicons increases and presents a challenge for determining whether contributions from one, two, or more amplicons are represented in signals collected from the same location, e.g. Krueger et al, PLosOne, 6(1): el6607 (January 2011). Software for identifying amplicons on these surfaces or layers typically assumes that signals generated from the population of amplicons are evenly distributed among those corresponding to the four different bases, so that in any given cycle of a sequencing operation roughly one quarter of the amplicons generate an “A” signal, one quarter generate a “C” signal, one quarter generate a “G” signal, and so on. This makes sense for many sequencing projects, such as sequencing whole genomes, where the distribution of bases on genome fragments of different amplicons can be treated as being random. However, if the actual distribution is skewed, for example, because templates are from a selected subset of related genes, such as immune system genes, then amplicons may be mis-identified or removed from analysis, leading to reduced sequencing yields.
It would be highly advantageous for sequencing libraries of related sequences, such as repertoires of recombined immune molecules, in random arrays if methods were available to ensure dial closely spaced amplicons were accurately identified.