Analysis of complex nucleic acid populations is a common problem in many areas of molecular biology, nowhere more so than in the analysis of patterns of gene expression. Various methods have been developed to allow simultaneous analysis of entire mRNA populations, or their corresponding cDNA populations, to enable us to begin to understand patterns of gene expression in vivo.
The method of "subtractive cloning" (Lee et al, Proc. Nat. Acad. Sci. U.S.A. 88, 2825-2829) allows identification of mRNAs, or rather, their corresponding cDNAs, that are differentially expressed in two related cell types. One can selectively eliminate cDNAs common to two related cell types by hybridising cDNAs from a library derived from one cell type to a large excess of mRNA from a related, but distinct cell type. mRNAs in the second cell type complementary to cDNAs from the first type will form double-stranded hybrids. Various enzymes exist which degrade such ds-hybrids allowing these to be eliminated thus enriching the remaining population in cDNAs unique to the first cell type. This method allows highly specific comparative information about differences in gene expression between related cell types to be derived and has had moderate success in isolating rare cDNAs.
The method of "differential display" (Laing and Pardee, Science 257, 967-971, 1992) sorts mRNAs using PCR primers to amplify selectively specific subsets of an mRNA population. An mRNA population is primed with a general poly-T primer to amplify one strand and a specific primer, of perhaps 10 nucleotides or so to amplify the reverse strand with greater specificity. In this way only mRNAs bearing the second primer sequence are amplified; the longer the second primer the smaller a proportion of the total cDNA population is amplified or any given sequence of that length used. The resultant amplified sub-population can then be cloned for screening or sequencing or the fragments can simply be separated on a sequencing gel. Low copy number mRNAs are less likely to get lost in this sort of scheme in comparison with subtractive cloning, for example, and it is probably more reproducible. Whilst this method is more general than subtractive cloning, time-consuming analysis is required.
The method of "molecular indexing" (PCT/GB93/01452) uses populations of adaptor molecules to hybridise to the ambiguous sticky-ends generated by cleavage of a nucleic acid with a type IIs restriction endonuclease to categorise the cleavage fragments. Using specifically engineered adaptors one can specifically immobilise or amplify or clone specific subsets of fragments in a manner similar to differential display but achieving a greater degree of control. Again, time-consuming analysis is required.
The method of Kato (Nucleic Acids Research 12, 3685-3690, 1995) exemplifies the above molecular indexing approach and effects cDNA population analysis by sorting terminal cDNA fragments into sub-populations followed by selective amplification of specific subsets of cDNA fragments. Sorting is effected by using type IIs restriction endonucleases and adaptors. The adaptors also carry primer sites which in conjunction with general poly-T primers allows selective amplification of terminal cDNA fragments as in differential display. It is possibly more precise than differential display in that it effects greater sorting: only about 100 cDNAs will be present in a given subset and sorting can be related to specific sequence features rather than using primers chosen by trial and error.
The method of "serial analysis of gene expression" (SAGE, Science 270, 484-487. 1995) allows identification of mRNAs, or rather, their corresponding cDNAs, that are expressed in a given cell type. It gives quantitative information about the levels of those cDNAs as well. The process involved isolating a "tag" from every CDNA in a population using adaptors and type IIs restriction endonucleases. A tag is a sample of a cDNA sequence of a fixed number of nucleotides sufficient to identify uniquely that cDNA in the population. Tags are then ligated together and sequenced. The method gives quantitative data on gene expression and will readily identify novel cDNAs. However, the method is extremely time-consuming in view of the large amount of sequencing required.
All of the above methods are relatively laborious and rely upon sequencing by traditional gel methods. Moreover, the methods require amplification by PCR, which is prone to produce artefacts.
Methods involving hybridisation grids, chips and arrays are advantageous in that they avoid gel methods for sequencing and are quantitative. They can be performed entirely in solution, thus are readily automatable. These methods come in two forms. The first involves immobilisation of target nucleic aids to an array of oligonucleotides complementary to the terminal sequences of the target nucleic acid. Immobilisation is followed by partial sequencing of those fragments by a single base method, e.g. using type IIs restriction endonucleases and adaptors. This particular approach is advocated by Brenner in PCT/US95/12678.
The second form involves arrays of oligonucleotides of N bp length. The array carries all 4.sup.N possible oligonucleotides at specific points on the grid. Nucleic acids are hybridised as single strands to the array. Detection of hybridisation is achieved by fluorescently labelling each nucleic acid and determining from where on the grid the fluorescence arises, which determines the oligonucleotide to which the nucleic acid has bound. The fluorescent labels also give quantitative information about how much nucleic acid has hybridised to a given oligonucleotide. This information and knowledge of the relative quantities of individual nucleic acids should be sufficient to reconstruct the sequences and quantities of the hybridising population. This approach is advocated by Lehrach in numerous papers and Nucleic Acids Research 22, 3423 contains a recent discussio n. A disadvant age of this approach is that the con struction of large arrays of oligonucleotides is extremely te chnically demanding and expensive.