Analysis of biological or medical samples often requires the determination of nucleic acid sequences of large and complex populations of DNA and/or RNA, e.g. Gloor et al, PLoS ONE 5(10): e15406 (2010); Petrosino et al, Clinical Chemistry, 55(5): 856-866 (2009); Arstila et al, Science, 286: 958-961 (1999). In particular, profiles of nucleic acids encoding immune molecules, such as T cell or B cell receptors, or their components, contain a wealth of information on the state of health or disease of an organism, so that the use of such profiles as diagnostic or prognostic indicators has been proposed for a wide variety of conditions, e.g. Faham and Willis, U.S. patent publication 2010/0151471; Freeman et al, Genome Research, 19: 1817-1824 (2009); Boyd et al, Sci. Transl. Med., 1(12): 12ra23 (2009); He et al, Oncotarget (Mar. 8, 2011). Such sequence-based profiles provide much greater sensitivity than approaches based on size distributions of amplified target nucleic acids, sequence sampling by microarrays, hybridization kinetics curves from PCR amplicons, or the like, e.g. Morley et al, U.S. Pat. No. 5,418,134; van Dongen et al, Leukemia, 17: 2257-2317 (2003); Ogle et al, Nucleic Acids Research, 31: e139 (2003); Wang et al, BMC Genomics, 8: 329 (2007); Baum et al, Nature Methods, 3(11): 895-901 (2006). However, because of the size and diversity of such nucleic acid populations, constructing useful profiles by sequence analysis poses significant challenges even for next-generation sequencing platforms, e.g. Warren et al, Bioinformatics, 25: 458-464 (2009); Warren et al, Genome Research (Epub 24 Feb. 2011); Garcia-Castillo et al, Cardiovascular & Haematological Disorders-Drug Targets, 9: 124-135 (2009).
Such challenges include uniform amplification of target populations so that nucleic acid quality of sequence reads; and selection of the number, composition and positioning of sequencing primers, in view of unknown target sequence variability, for example, caused by somatic hypermutation, clonal evolution, or like phenomena, e.g. Li et al, Blood, 102(13): 4520-4526 (2003); Tichopad et al, Clin. Chem., 55: 1816-1823 (2009); Brockman et al, Genome Research, 18: 763-770 (2008).
It would be very useful and advantageous for many fields in medicine and biology, if methods were available for overcoming drawbacks of current methodologies for analyzing complex populations of nucleic acids, particularly with respect to high-throughput sequencing platforms having limited sequence read lengths or significantly declining sequence quality as a function of read length.