The human body is comprised primarily of specialised cells performing different physiological functions organised into organs and tissues. All human cells contain DNA, arranged in a series of sub-units known as genes. It is estimated that there are approximately 100,000 genes in the human genome. Genes are the blueprints for proteins. Proteins may perform a wide variety of biological functions, for example messengers, catalysts and sensors. Such compounds are responsible for managing most of the physiological and biochemical functions in humans and all other living organisms. Over the last few decades, there has been a growing recognition that many major diseases have a genetic basis. It is now well established that genes play an important role in cancer, cardiovascular diseases, psychiatric disorders, obesity, and metabolic diseases. Significant resources are being focused on genomic research based on the notion that the nucleotide sequences of a particular gene and its predicted protein product will lead to an understanding of its function in healthy and malfunctioning cells or tissues. This understanding is expected, in turn, to lead to therapeutic and diagnostic approaches, focused on molecular targets associated with the gene and the protein it expresses. The first step on the way to the development of such applications is to identify the genes specifically involved in the different categories of diseases. Application of this knowledge can produce new and valuable markers, identifying regions producing major diseases to be used for diagnostic and therapeutic benefit.
Faced with the high complexity of the human genome, many approaches are being used to unravel the connection between primary gene structure and function. One well publicised approach is embodied in the Human Genome Mapping Project, where the sequence of all the individual genes in the entire human genome is painstakingly being determined. At the present, however, little information can be directly retrieved on the function of the identified genes and still less about temporal and spatial expression patterns of the developing or mature organism. Other approaches, such as random cDNA sequencing, involve the sequence determination of all genes expressed in a certain tissue, or developmental stage, of an organism. Like a number of other strategies, this is time consuming and prone to numerous problems.
Although the flood of data from large scale sequencing programmes is of enormous benefit to the scientific community, one of the major problems faced by such "shotgun" approaches is the lack of specific information that can be retrieved without significantly more work on the biology of each of the individual genes.
Several other approaches have been taken by molecular biologists to obtain more specific information on the genetic background of particular biological processes. Such approaches rely on a common concept. One gene, or a subset of genes, is switched on, initiating the healthy, pathological, or developmental status of an organ or cell type.
In a large number of experimental systems the isolation of genes, on the basis of their differential expression, has been applied successfully. Differential screening and subtractive hybridisation of cDNA libraries have become well established, cf. Zimmerman et al. (1980) and Davis et al. (1979). Differential library screening works well in practice for genes that are highly expressed, but mRNAs of low abundance are difficult to isolate. Subtractive hybridisation provides a more sensitive screening, but requires large amounts of RNA. More recently RNA fingerprinting methods (often referred to as differential display or DD/RT PCR) have been added to these tools, offering attractive new features for isolating genes. RNA fingerprinting methods are PCR based and therefore do not require large amounts of RNA for experiments. In addition to this, RNA fingerprinting methods allow a large number of RNA pools to be screened for specific mRNAs simultaneously. Investigation of a wide range of pathogenic developmental stages and their controls would be possible. To date, two methods of RNA fingerprinting have proven useful for isolating genes. In 1992 Liang et al. published a protocol (U.S. Pat. No. 5,262,311), soon after a protocol from Welsh et al. (1992) was presented. Both methods begin with cDNA synthesis from RNA using at least one arbitrary primer for the initiation of first and second strand synthesis.
Welsh et al. (1992) designed a protocol in which the same arbitrary 20-mer oligo is used for first and second strand synthesis. Using arbitrary primers only a subset of the mRNAs are transcribed to cDNA. The cDNA pools are then used for a standard PCR with the same primers. One of the dNTPs in the PCR mix contains a radioactive label (.sup.35 S or .sup.32 P) for visualisation of the PCR fragments with PAGE. The Liang and Welsh methods rely on at least one small arbitrary primer for selection of specific cDNAs. As a consequence annealing temperatures are low (.about.40.degree. C.), and all amplified cDNA fragments originate from a certain degree of mismatch priming. Later several groups produced refinements and optimisations leading to a plethora of articles describing the usefulness of the method (Bauer and Warthoe et al. 1993; Warthoe et al. 1995; Liang and Warthoe et al. 1995; Rohde and Warthoe et al. 1996).