Many processes are characterized or regulated by the absolute or relative amounts of a plurality of items. For example, in biology, the level of expression of particular genes or groups of genes or the number of copies of chromosomal regions can be used to characterize the status of a cell or tissue. Analog methods such as microarray hybridization methods and real-time PCR are alternatives, but digital readout methods such as those disclosed herein have advantages over analog methods. Methods for estimating the abundance or relative abundance of genetic material having increased accuracy of counting would be beneficial.
The availability of convenient and efficient methods for the accurate identification of genetic variation and expression patterns among large sets of genes may be applied to understanding the relationship between an organism's genetic make-up and the state of its health or disease, Collins et al, Science, 282: 682-689 (1998). In this regard, techniques have been developed for the analysis of large populations of polynucleotides based either on specific hybridization of probes to microarrays, e.g. Lockhart et al. Hacia et al, Nature Genetics, 21: 4247 (1999), or on the counting of tags or signatures of DNA fragments, e.g. Velculescu et al, Science, 270: 484-487 (1995); Brenner et al, Nature Biotechnology, 18: 630-634 (2000). These techniques have been used in discovery research to identify subsets of genes that have coordinated patterns of expression under a variety of circumstances or that are correlated with, and predictive of events, of interest, such as toxicity, drug responsiveness, risk of relapse, and the like, e.g. Golub et al, Science, 286: 531-537 (1999); Alizadeh et al, Nature, 403: 503-511 (2000); Perou et al, Nature, 406: 747-752 (2000); Shipp et al, Nature Medicine, 8: 68-74 (2002); Hakak et al, Proc. Natl. Acad. Sci., 98: 47454751 (2001); Thomas et al, Mol. Pharmacol., 60: 1189-1194 (2001); De Primo et al, BMC Cancer 2003, 3:3. Not infrequently the subset of genes found to be relevant has a size in the range of from ten or under to a few hundred.
In addition to gene expression, techniques have also been developed to measure genome-wide variation in gene copy number. For example, in the field of oncology, there is interest in measuring genome-wide copy number variation of local regions that characterize many cancers and that may have diagnostic or prognostic implications. For a review see Zhang et al. Annu. Rev. Genomics Hum. Genet. 2009. 10:451-81.
While such hybridization-based techniques offer the advantages of scale and the capability of detecting a wide range of gene expression or copy number levels, such measurements may be subject to variability relating to probe hybridization differences and cross-reactivity, element-to-element differences within microarrays, and microarray-to-microarray differences, Audic and Claverie, Genomic Res., 7: 986-995 (1997); Wittes and Friedman, J. Natl. Cancer Inst. 91: 400-401 (1999).
On the other hand, techniques that provide digital representations of abundance, such as SAGE (Velculescu et al. Science 270, 484-487 (1995) and Velculescu et al. Cell 88 (1997) or MPSS (Brenner et al, cited above), are statistically more robust; they do not require repetition or standardization of counting experiments as counting statistics are well-modeled by the Poisson distribution, and the precision and accuracy of relative abundance measurements may be increased by increasing the size of the sample of tags or signatures counted, e.g. Audic and Claverie (cited above). SAGE relies on short sequence tags (10-14 bp) within transcripts as an indicator of the presence of a given transcript. The tags are separated from the rest of the RNA and can be linked together to form long serial molecules that can be cloned and sequenced. Quantitation of the number of times a particular tag is observed provides an estimate of the relative expression level of the corresponding transcript, relative to other tagged transcripts, but not an actual count of the number of times that transcript appears. Other methods based on estimating relative abundance have also been described. See, for example, Wang et al. Nat. Rev. Genet. 10, 57-63 (2009).
Both digital and non-digital hybridization-based assays have been implemented using oligonucleotide tags that are hybridized to their complements, typically as part of a detection or signal generation schemes that may include solid phase supports, such as microarrays, microbeads, or the like, e.g. Brenner et al, Proc. Natl. Acad. Sci., 97: 1665-1670 (2000); Church et al, Science, 240: 185-188 (1988); Chee, Nucleic Acids Research, 19: 3301-3305 (1991); Shoemaker et al., Nature Genetics, 14: 450456 (1996); Wallace, U.S. Pat. No. 5,981,179; Gerry et al, J. Mol. Biol., 292: 251-262 (1999); Fan et al., Genome Research, 10: 853-860 (2000); Ye et al., Human Mutation, 17: 305-316 (2001); and the like. Bacterial transcript imaging by hybridization of total RNA to nucleic acid arrays may be conducted as described in Saizieu et al., Nature Biotechnology, 16:45-48 (1998). Accessing genetic information using high density DNA arrays is further described in Chee et al., Science 274:610-614 (1996). Additional methods for digital profiling are disclosed, for example, in U.S. Patent Pub. 20050250147 and U.S. Pat. No. 7,537,897. Tagging approaches have also been used in combination with next-generation sequencing methods, see for example, Smith et al. NAR (May 11, 2010), 1-7.
A common feature among all of these approaches is a one-to-one correspondence between probe sequences and oligonucleotide tag sequences. That is, the oligonucleotide tags have been employed as probe surrogates for their favorable hybridizations properties, particularly under multiplex assay conditions.
Determining small numbers of biological molecules and their changes is essential when unraveling mechanisms of cellular response, differentiation or signal transduction, and in performing a wide variety of clinical measurements. Although many analytical methods have been developed to measure the relative abundance of different molecules through sampling (e.g., microarrays and sequencing), few techniques are available to determine the absolute number of molecules in a sample. This can be an important goal, for example in single cell measurements of copy number or stochastic gene expression, and is especially challenging when the number of molecules of interest is low in a background of many other species. As an example, measuring the relative copy number or expression level of a gene across a wide number of genes can currently be performed using PCR, hybridization to a microarray or by direct sequence counting. PCR and microarray analysis rely on the specificity of hybridization to identify the target of interest for amplification or capture respectively, then yield an analog signal proportional to the original number of molecules. A major advantage of these approaches is in the use of hybridization to isolate the specific molecules of interest within the background of many other molecules, generating specificity for the readout or detection step. The disadvantage is that the readout signal to noise is proportional to all molecules (specific and non-specific) specified by selective amplification or hybridization. The situation is reversed for sequence counting. No intended sequence specificity is imposed in the sequence capture step, and all molecules are sequenced. The major advantage is that the detection step simply yields a digital list of those sequences found, and since there is no specificity in the isolation step, all sequences must be analyzed at a sufficient statistical depth in order to learn about a specific sequence. Although significant technical advances in sequencing speed and throughput have occurred, the statistical requirements imposed to accurately measure small changes in concentration of a specific gene (or target) or a few targets within the background of many other sequences requires measuring many sequences that are not of interest to find the ones that are of interest. Each of these techniques, PCR, array hybridization and sequence counting is a comparative technique in that they primarily measure relative abundance, and do not typically yield an absolute number of molecules in a solution. Digital PCR is one method that may be used for absolute counting of nucleic acids (B. Vogelstein, K. W. Kinzler, Proc Natl Acad Sci USA 96, 9236 (Aug. 3, 1999). In this application of PCR solutions are progressively diluted into individual compartments until there is an average probability of one molecule per two wells, then detected by PCR. Although digital PCR can be used as a measure of absolute abundance, the dilutions must be customized for each type of molecule, and thus in practice the method is generally limited to the analysis of a small number of different molecules.