Analysis of biological or medical samples often requires the determination of nucleic acid sequences of large and complex populations of DNA and/or RNA, e.g. Gloor et al, PLoS ONE 5(10); e15406, (2010); Petrosino et al, Clinical Chemistry, 55(5): 8(56-866 (2009); Arstila et al, Science, 286: 958-961 (1999). In particular, profiles of nucleic acids encoding immune molecules, such as T cell or B cell receptors, or their components, contain a wealth of information on the state of health or disease of an organism, so that the use of such profiles as diagnostic or prognostic indicators has been proposed for a wide variety of conditions, e.g. Faham and Willis, U.S. patent publication 2010/0151471; Freeman et al, Genome Research, 19: 1817-1824 (2009); Boyd et al, Sci. Transl. Med., 1(12): 12ra23 (2009); He et al, Oncotarget (Mar. 8, 2011). Such sequence-based profiles are capable of much greater sensitivity than approaches based on size distributions of amplified target nucleic acids, sequence sampling by microarrays, hybridization kinetics curves from PCR amplicons, or other approaches, e.g. Morley et al, U.S. Pat. No. 5,418,134; van Dongen et al. Leukemia, 17: 2257-2317 (2003); Ogle et al. Nucleic Acids Research, 31: e139 (2003); Wang et al, BMC Genomics, 8: 329 (2007); Baum et al. Nature Methods, 3(11): 895-901 (2006). However, the efficient determination of clonotypes and clonotype profiles from sequence data poses challenges because of the size of populations to be analyzed, the similarity of sequences in such populations, the limited predictability of natural variability among the sequences, and noise introduced into the data by a host of sample preparation and measurement steps, e.g. Warren et al, Genome Research, 21(5): 790-797 (2011).
Sequence tags, or barcodes, have been used in a variety of ways to assist in the analysis of nucleic acid populations, including labeling, contamination monitoring, rare mutant detection, physical sorting, molecular counting, and the like, e.g. Kinde et al. Proc. Natl. Acad. Sci., 108(23): 9530-9535 (2011); Casbon et al. U.S. patent publication 2012/0071331; Brenner, U.S. Pat. No. 5,635,440; Brenner and Macevicz, U.S. Pat. No. 7,537,897; Brenner et al, Proc. Natl. Acad. Sci. 97: 1665-1670 (2000); Church et al, European patent publication 0 303 459; Shoemaker et al. Nature Genetics, 14: 450-456 (1996); Morris et al. European patent publication 0799897A1; Wallace, U.S. Pat. No. 5,981,179. Recently Kinde et al (cited above) showed how sequence tags could be used to distinguish sequencing and amplification errors from rare mutations in a reference sequence.
In view of the importance of accurate sequencing for medical and diagnostic applications, it would be highly advantageous if the use of sequence tags could be expanded for increasing the efficiency and accuracy of sequence determination in such applications.