Many processes are characterized or regulated by the absolute or relative amounts of a plurality of items. For example, in biology, the level of expression of particular genes or groups of genes or the number of copies of chromosomal regions can be used to characterize the status of a cell or tissue. Analog methods such as microarray hybridization methods and real-time PCR are alternatives, but digital readout methods such as those disclosed herein have advantages over analog methods. Methods for estimating the abundance or relative abundance of genetic material having increased accuracy of counting would be beneficial.
The availability of convenient and efficient methods for the accurate identification of genetic variation and expression patterns among large sets of genes may be applied to understanding the relationship between an organism's genetic make-up and the state of its health or disease, Collins et al, Science, 282: 682-689 (1998). In this regard, techniques have been developed for the analysis of large populations of polynucleotides based either on specific hybridization of probes to microarrays, e.g. Lockhart et al. Hacia et al, Nature Genetics, 21: 4247 (1999), or on the counting of tags or signatures of DNA fragments, e.g. Velculescu et al, Science, 270: 484487 (1995); Brenner et al, Nature Biotechnology, 18: 630-634 (2000). These techniques have been used in discovery research to identify subsets of genes that have coordinated patterns of expression under a variety of circumstances or that are correlated with, and predictive of events, of interest, such as toxicity, drug responsiveness, risk of relapse, and the like, e.g. Golub et al, Science, 286: 531-537 (1999); Alizadeh et al, Nature, 403: 503-511 (2000); Perou et al, Nature, 406: 747-752 (2000); Shipp et al, Nature Medicine, 8: 68-74 (2002); Hakak et al, Proc. Natl. Acad. Sci., 98: 47454751 (2001); Thomas et al, Mol. Pharmacol., 60: 1189-1194 (2001); De Primo et al, BMC Cancer 2003, 3:3; and the like. Not infrequently the subset of genes found to be relevant has a size in the range of from ten or under to a few hundred.
In addition to gene expression, techniques have also been developed to measure genome-wide variation in gene copy number. For example, in the field of oncology, there is interest in measuring genome-wide copy number variation of local regions that characterize many cancers and that may have diagnostic or prognostic implications. For a review see Zhang et al. Annu. Rev. Genomics Hum. Genet. 2009. 10:451-81.
While such hybridization-based techniques offer the advantages of scale and the capability of detecting a wide range of gene expression or copy number levels, such measurements may be subject to variability relating to probe hybridization differences and cross-reactivity, element-to-element differences within microarrays, and microarray-to-microarray differences, Audic and Clayerie, Genomic Res., 7: 986-995 (1997); Wittes and Friedman, J. Natl. Cancer Inst. 91: 400-401 (1999).
On the other hand, techniques that provide digital representations of abundance, such as SAGE (Velculescu et al, cited above) or MPSS (Brenner et al, cited above), are statistically more robust; they do not require repetition or standardization of counting experiments as counting statistics are well-modeled by the Poisson distribution, and the precision and accuracy of relative abundance measurements may be increased by increasing the size of the sample of tags or signatures counted, e.g. Audic and Clayerie (cited above).
Both digital and non-digital hybridization-based assays have been implemented using oligonucleotide tags that are hybridized to their complements, typically as part of a detection or signal generation schemes that may include solid phase supports, such as microarrays, microbeads, or the like, e.g. Brenner et al, Proc. Natl. Acad. Sci., 97: 1665-1670 (2000); Church et al, Science, 240: 185-188 (1988); Chee, Nucleic Acids Research, 19: 3301-3305 (1991); Shoemaker et al., Nature Genetics, 14: 450456 (1996); Wallace, U.S. Pat. No. 5,981,179; Gerry et al, J. Mol. Biol., 292: 251-262 (1999); Fan et al., Genome Research, 10: 853-860 (2000); Ye et al., Human Mutation, 17: 305-316 (2001); and the like. Bacterial transcript imaging by hybridization of total RNA to nucleic acid arrays may be conducted as described in Saizieu et al., Nature Biotechnology, 16:45-48 (1998). Accessing genetic information using high density DNA arrays is further described in Chee et al., Science 274:610-614 (1996). Tagging approaches have also been used in combination with next-generation sequencing methods, see for example, Smith et al. NAR (May 11, 2010), 1-7.
A common feature among all of these approaches is a one-to-one correspondence between probe sequences and oligonucleotide tag sequences. That is, the oligonucleotide tags have been employed as probe surrogates for their favorable hybridizations properties, particularly under multiplex assay conditions.
Determining small numbers of biological molecules and their changes is essential when unraveling mechanisms of cellular response, differentiation or signal transduction, and in performing a wide variety of clinical measurements. Although many analytical methods have been developed to measure the relative abundance of different molecules through sampling (e.g., microarrays and sequencing), few techniques are available to determine the absolute number of molecules in a sample. This can be an important goal, for example in single cell measurements of copy number or stochastic gene expression, and is especially challenging when the number of molecules of interest is low in a background of many other species. As an example, measuring the relative copy number or expression level of a gene across a wide number of genes can currently be performed using PCR, hybridization to a microarray or by direct sequence counting. PCR and microarray analysis rely on the specificity of hybridization to identify the target of interest for amplification or capture respectively, then yield an analog signal proportional to the original number of molecules. A major advantage of these approaches is in the use of hybridization to isolate the specific molecules of interest within the background of many other molecules, generating specificity for the readout or detection step. The disadvantage is that the readout signal to noise is proportional to all molecules (specific and non-specific) specified by selective amplification or hybridization. The situation is reversed for sequence counting. No intended sequence specificity is imposed in the sequence capture step, and all molecules are sequenced. The major advantage is that the detection step simply yields a digital list of those sequences found, and since there is no specificity in the isolation step, all sequences must be analyzed at a sufficient statistical depth in order to learn about a specific sequence. Although very major technical advances in sequencing speed and throughput have occurred, the statistical requirements imposed to accurately measure small changes in concentration of a specific gene within the background of many other sequences requires measuring many sequences that don't matter to find the ones that do matter. Each of these techniques, PCR, array hybridization and sequence counting is a comparative technique in that they primarily measure relative abundance, and do not typically yield an absolute number of molecules in a solution. A method of absolute counting of nucleic acids is digital PCR (B. Vogelstein, K. W. Kinzler, Proc Natl Acad Sci USA 96, 9236 (Aug. 3, 1999)), where solutions are progressively diluted into individual compartments until there is an average probability of one molecule per two wells, then detected by PCR. Although digital PCR can be used as a measure of absolute abundance, the dilutions must be customized for each type of molecule, and thus in practice is generally limited to the analysis of a small number of different molecules.