More and more diagnostic and prognostic applications are being developed that use large-scale DNA sequencing as the per-base cost of DNA sequencing has dropped and sequencing technologies have become more reliable and convenient. e.g. Faham and Willis, U.S. patent publication 2010/0151471; Freeman et al, Genome Research, 19: 1817-1824 (2009); Boyd et al. Sci. Transl. Med., 1(12): 12ra23 (2009): He et al, Oncotarget (Mar. 8, 2011): Palomaki et al, Genet. Med., 14(3): 296-305 (2012); Kohlmann et al., Semin. Oncol., 39(1): 26-36 (2012). In particular, profiles of nucleic acids encoding immune molecules, such as T cell or B cell receptors, or their components, contain a wealth of information on the state of health or disease of an organism, so that the use of such profiles as diagnostic or prognostic indicators has been proposed for a wide variety of conditions, e.g. Faham and Willis (cited above); Boyd et al (cited above); He et al (cited above). Moreover, such sequence-based profiles are capable of much greater sensitivity than approaches based on size distributions of amplified CDR-encoding regions, sequence sampling by microarrays, hybridization kinetics curves from PCR amplicons, or other approaches, e.g. Morley et al. U.S. Pat. No. 5,418,134; van Dongen et al, Leukemia, 17: 2257-2317 (2003); Ogle et al, Nucleic Acids Research, 31: e139 (2003); Wang et al. BMC Genomics, 8: 329 (2007); Baum et al, Nature Methods, 3(11): 895-901 (2006).
However, as in other DNA-based assays that employ amplification steps, the presence of contaminating or cross-contaminating DNA may reduce the effective limit of detection in assays employing immune repertoire sequencing. Sources of contaminating DNA include assay reagents, equipment, operator handling, aersols, and the like, e.g. Urban et al, J. Forensic Sci., 45(6): 1307-1311 (2000); Kwok, pgs. 142-145, in Innis et al, Editors, PCR Protocols (Academic Press, 1990).
Detection of minimal residual disease (MRD) of cancers is impacted by such contamination. Patients treated for many cancers often retain an MRD related to the cancer. That is, even though a patient may have by clinical measures a complete remission of the disease in response to treatment, a small fraction of the cancer cells may remain that have, for one reason or another, escaped destruction. The type and size of this residual population is an important prognostic factor for the patient's continued treatment, e.g. Campana, Hematol. Oncol. Clin. North Am., 23(5): 1083-1098 (2009); Buccisano et al, Blood, 119(2): 332-341 (2012). Thus, the more sensitive the measurement of MRD, the more likely that a subsequent course of treatment will be successful, e.g. Szczepanski et al, Best Pract. Res. Clin. Haematol., 15(1): 37-57 (2002). Several techniques for assessing this population have been developed, including techniques based on flow cytometry, in situ hybridization, cytogenetics, amplification of nucleic acid markers, and the like, e.g. Buccisano et al. Current Opinion in Oncology, 21: 582-588 (2009); van Dongen et al, Leukemia, 17(12): 2257-2317 (2003): and the like. PCR and sequence-based analysis of nucleic acids encoding segments of recombined immune receptors (i.e. clonotypes) have been particularly useful in assessing MRD in leukemias and lymphomas, since such segment (referred to herein as “clonotypes”) typically have unique sequences which may serve as molecular tags for their associated cancer cells, e.g. Van Dongen et al (cited above); Faham and Willis, U.S. patent publication 2011/0207134; and the like. Nevertheless, the sensitivity of such techniques is still limited by the presence of cross-over contamination from other individuals.
In view of the potential impact of sequence-based diagnostic and prognostic applications, it would be highly desirable if there were available methods for conveniently detecting and quantifying sample contamination, particularly in assays using immune repertoire sequencing in settings where large numbers of patient samples are processed.