The mammalian cell contains approximately 50,000 genes encoded in its DNA. The level of expression of each of these genes can vary over a wide range, so that the possible combinations of gene activity is extremely large. While it is clear that a particular pattern of gene activity characterizes a cell in terms of its phenotypic properties, the very large complexity of the genetic material expressed has made it very difficult to compare overall gene activity between cells or tissue types. Previous data has relied upon the difficult interpretation of hybridization kinetics of very complex probes which represent the total number of genes that are active, or transcribed, in the cells or tissues under study.
The development of cloning methodology has in one sense made the analysis of differential gene expression much more simple. Using what are now standard techniques, individual gene sequences can be isolated in bacteria or other hosts and genes can be selected which are expressed only in a particular tissue, such as the globin gene in erythrocytes or the ovalbumin gene in oviduct, or which are expressed at different levels in two cell types or tissues which are to be compared. These purified gene sequences can then be radioactively labeled by a variety of procedures and used as probes to assay the level at which they are expressed in any tissue. Such probes can be used to distinguish, for example, an undifferentiated erythroid cell from a differentiated one as determined by the synthesis of the messenger RNA for globin, which is characteristic for erythroid differentiation. Approaches such as this, however, do not address whether there are changes in expression of the other 50,000 genes in the genome. For example, in comparing the messenger RNA population of the kidney to that of the brain and liver, hybridization kinetics suggest that many thousands of genes are differentially expressed in these tissues in addition to the few highly abundant sequences which may be characteristic of each tissue (Hastie and Bishop, 1976).
These considerations arise in all experiments which seek to analyze gene expression that accompanies the transformation of cells or when specific genes are sought which can be used to distinguish between normal cells and cells associated with neoplasic properties, such as malignancy, drug resistance, sensitivity, or invasiveness. For example, several of the proto-onc genes that are homologous to retroviral transforming genes have been shown to be elevated in expression in some human tumors (Slamon et al, 1984). It is not known, however, how many other changes there are in gene expression between any of these tumors and the corresponding normal tissue. The importance of this is illustrated by the fact that infection of normal chick embryo fibroblasts by the Rous sarcoma virus, which transforms the cells by introduction of the oncogene src and its subsequent expression, is accompanied by the appearance of approximately 1,000 new RNA transcripts (Groudine and Weintraub, 1980). Hence, even in this relatively simple case of viral transformation, where the etiological agent (the src gene) has been identified and is well understood, very complex changes in gene expression accompany, and may be the cause of, various properties that are characteristic of the transformation.
Cloning of gene sequences has become routine in many laboratories. Most cloning procedures are based on the original concepts of Stanley Cohen and Herb Boyer (U.S. Pat. No. 4,237,224). The procedures generally involve generating DNA molecules with "sticky" ends: that is, digesting the DNA with a restriction enzyme which leave short single stranded regions that are complementary to any other DNA molecule cut with the same enzyme. Hence, the DNA fragments containing the same sticky ends will hybridize to each other. In this way, DNA sequences from a eukaryotic genome, such as human, can be inserted into vectors, such as plasmids and viral genomes, which can be used to introduce the sequence into a bacterial host. Various methods may be utilized in the selection of bacteria which contain a particular cloned DNA sequence of interest.
One general class of procedures involves immobilizing the DNA from the bacteria or virus on nitrocellulose membrane filters. This can be accomplished in various ways. Bacterial colonies can be grown directly on, or transfered to, the filters. The colonies are subsquently lysed and washed in various buffers and the DNA immobilized on the filter by baking at elevated temperature under vacuum. This method was originally developed by Grunstein and Hogness (1975). A similiar method for transfering and fixing DNA from viral plaques was developed by Benton and Davies (1977). Purified DNA can also be fixed to nitrocellulose. When the DNA has been "fixed" to the filter by baking, the filter can be hybridized to probes which consist of nucleic acid (RNA or DNA) that has been labeled with radioactivity. Following washing to remove non-specifically bound material, the filter is usually exposed to x-ray film in order to obtain an image of the site at which hybridization has occured. The extent of hybridization can be determined by the intensity of the signal on the x-ray film. In this way, the bacteria or virus harboring a cloned sequence of interest can be located amongst a large number of bacteria or viruses.
The most common use of such procedures is the identification of a particular cloned sequence among a large number, by comparison with a plurality of known cloned sequences. For this, the probe used to hybridize to the cloned sequences on the surface of the filters must be enriched or purified in order for the sequence to be located. Other procedures involve hybridizing duplicates of the filters to two probes which are complex but differ in some significant way. For example, one can hybridize cloned sequences to probes made from two different cell or tissue types, the probes representing the total genetic complexity of the two types. Colonies on the replicate filters which hybridize differentially to the two probes then represent gene sequences which are differentially expressed in the two cell types. The procedure has been used to identify sequences which are differentially expressed during development in Xenopus (Dworkin and David, 1980), Aspergillus (Zimmermann et al., 1980) Dictyostellium (Williams and Lloyd, 1979) and sea urchin (Laskey et al, 1980). It has also been used to identify galactose inducible sequences in yeast (St John and Davis, 1979) and genes differentially expressed in human lymphocytes and fibroblasts (Crampton et al, 1980), and in various mouse tumors and normal tissues (Augenlicht and Kobrin, 1982).
Screening of libraries of sequences with these procedures has become routine in many laboratories. Usually, large numbers of clones are spread out at random. The screen is done, and a particular clone of interest is located by lining up the plate on which the clones are grown, the filter, and the x-ray film by use of reference marks. Gergen et al (1979) first enunciated the idea that replicas of an arrayed library would be extremely useful. Since the position of each clone in the array is known and reproducible, every time replicas of this are screened with a probe, one accumulates data on each member of the library. Gergen et al (1979) published procedures for storing clones in plates having a defined pattern of wells (one clone to a well) and of replicating this ordered library for screening purposes. However, as with most other work of a similar nature, this work did not quantitate the level of hybridization of each member of the library each time it was screened. Instead, qualitative evaluations of level of hybridization of some of the clones were recorded. While quantitation of level of hybridization is routine, Laskey et al made the first approach to quantitation of hybridization of members of an arrayed library. Using procedures similar to those published by Gergen et al (1979), they arrayed clones isolated from sea urchin and hybridized replicas of this arrayed library to probes made from sea urchin tissues at various stages of development (Laskey et al, 1980). The filters upon which the clones had been hybridized were then cut into sections and the sections counted to determine the amount of radioactivity hybridized to each clone. When compared to a set of standards, this gave an estimate of number RNA molecules per cell of each cloned sequence at stages of development of the sea urchin tissues.
The phenotype of a cell depends on the complement of genes that are expressed and the relative levels of expression of those sequences. For example, whether a tissue is malignant or benign, the site to which it would likely metastasize, its resistance or sensitivity to particular drug regimens, and the likely source of an unidentifiable tumor could be determined by comparing the pattern of expression of large numbers of sequences of an unknown sample to known patterns obtained. These may be compared by the use of a manual or automated technique.
An example of such a method for analyzing the level of expression of each of large numbers of genes is to hybridize dot blots of each of the sequences, or fixed bacterial colonies or phage plaques, each containing a different cloned sequence (Grunstein and Hogness, 1975; Benton and Davis, 1977) to a radio-labeled probe. The resulting hydridized filters are then cut into sections so that the radioactivity hybridized to each cloned sequence can be determined separately by counting in a liquid scintillation counter. This is precisely the manner in which the experiments of Laskey et al. (1980) were done.
The data which is stored in a computer data base can be compared from samples of known pathology. This permits the analysis of expression of large numbers of sequences that are used to distinguish phenotypes rather than qualitative or quantitative changes in a single or small number of genes. It has several advantages over the approach that is limited to a single gene. These advantages include the potential of detecting more subtle distinctions between related phenotypes (e.g., malignant cell types) and providing a means for detecting those phenotypes which may be determined by complex patterns of gene expression. Based on the assumption that malignancy or premalignancy may not be determined by changes in a single gene in human disease, the invention is based on the examination of a profile of a large number of genes.