1. Field of the Invention
The field of this invention is defining markers for cell types.
2. Background
The identity of a cell is a direct manifestation of the specific complement of genes that it expresses from among the 50,000 to 100,000 genes in the genome. Because individual cell types usually exist to perform specific functions within the organism, a technology that defines cell types through gene expression would not only permit us to assign the expression of genes to functionally defined cell types, but it would also enable us more easily to discover genes imparting functionally relevant properties to individual cells. This assignment of function to gene sequences is a major goal of the field of genomics.
A technology to identify distinct cell types systematically based upon patterns of gene expression would therefore permit very useful, functionally important definitions of cells.
Approaches to such a technology have usually involved performing pairwise comparisons of expressed genes from different cell types (for example, differential display or subtractive hybridization). A shortcoming of such approaches is the impracticality of using pairwise comparisons to identify numerous cell types in a complex tissue. Furthermore, such approaches usually rely upon the ability to isolate cells as pure populations, a situation that does not exist for most cell types in most tissues. Technologies are also needed that would allow the identification of cell types without knowing in advance that they exist. In the human brain, for example, neurons have historically been defined by parameters such as morphology, position, connectivity, and the expression of a small number of marker genes. However, we do not know how many intrinsically different cell types exist in the brain, what functional differences most of these cell types have, and how these differences are manifested in the expression of specific genes. A solution to a problem of this magnitude requires development of new technologies. We describe such a technology here.
Sippel (1973) Eur.J.Biochem. 37, 31-40 discloses the characterization of an ATP:RNA adenyltransferase from E. coli and Wittmann et al. (1997) Biochim.Biophys.Acta 1350, 293-305 disclose the characterization of a mammalian poly(A) polymerase. Gething et al. (1980) Nature 287, 301-306 disclose the use of an ATP:RNA adenyltransferase to polyadenylate the ""3 termini of total influenza virus RNA. Eberwine et al. (1996) U.S. Pat. No.5,514,545 describes a method for characterizing single cells based on RNA amplification. Eberwine et al. (1992) Proc.Natl.Acad.Sci USA 89, 3010-3014, describe the analysis of gene expression in single live neurons. Gubler U and Hoffman B J. (1983) Gene (2-3), 263-9, describe a method for generating cDNA libraries, see also the more recent reviews, Gubler (1987) Methods in Enzymology, 152, 325-329 and Gubler (1987) Methods in Enzymology, 152, 330-335. Clontech (Palo Alto, Calif.) produces a xe2x80x9cCapfinderxe2x80x9d cloning kit that uses xe2x80x9cGGGxe2x80x9d primers against nascent cDNAs capped with reverse transcriptase, Clontechniques 11, 2-3 (Oct. 1996), see also Maleszka et al. (1997) Gene 202, 39-43.
The invention provides methods and compositions for defining a cell type. The general methods involve the steps of (a) amplifying the mRNA of a single cell of a heterogenous population of cells; (b) probing a comprehensive expression library with the amplified mRNA to define a gross expression profile of the cell; and (c) comparing the gross expression profile of the cell with a gross expression profile of one or more other cells to define a unique expression profile of the cell, wherein the unique expression profile of the cell provides a marker defining the cell type. In particular embodiments, step (c) comprises comparing the gross expression profile of the cell with a gross expression profile of (i) a plurality of other cells to define a unique expression profile of the cell; (ii) a plurality of other single cells to define a unique expression profile of the cell; and/or (iii) a plurality of gross expression profiles of each of a plurality of other single cells to define a unique expression profile of the cell, and the plurality of other single cells are derived from a functionally or structurally distinct subpopulation of cells. Accordingly, the invention may involve the steps of: (a) defining a heterogenous subpopulation of cells of an organism; (b) constructing a comprehensive library from the mRNA of the subpopulation of cells; (c) amplifing the mRNA of a single cell of the population; and (d) probing the library with the amplified mRNA to define gene expression of the cell, wherein the gene expression of the cell provides a marker defining the cell type.
The subpopulation of cells comprises a discernable group of cells sharing a common characteristic. For example, the subpopulation may comprise tissue-specific cells, e.g. hippocampal neurons, cells presenting a common marker, such as CD8+cells, etc. In one embodiment, the marker derives from a common mutation, particularly where the mutation is an inserted genetic construct which encodes and provides each cell with a common selectable marker, such as an epitope or signal-producing protein. In a preferred embodiment, the inserted construct further encodes and provides each cell an internal ribosome entry sequence and the construct is inserted into a target gene downstream of the stop codon but upstream of the polyadenylation signal in the last exon of the target gene, such that the internal ribosome entry sequence provides a second open reading frame within a transcript of the target gene. Selection and/or separation of the target subpopulation may be effected by any convenient method. For example, where the marker is an externally accessible, cell-surface associated protein or other epitope-containing molecule, immuno-adsorption panning techniques or fluorescent immuno-labeling coupled with fluorescence activated cell sorting are conveniently applied.
The probed library is typically a cDNA library, preferably normalized or subtracted.
In a particular embodiment, the library comprises a high density ordered array of immobilized nucleic acids.
The mRNA may be amplified by any technique applicable to a single cell. In a particular embodiment, the amplification is a linear method comprising the steps of adding a known nucleotide sequence to the 3xe2x80x2 end of a first RNA having a known sequence at the 5xe2x80x2 end to form a second RNA and reverse transcribing the second RNA to form a cDNA.
Finally, the library is probed with the amplified mRNA to determine gene expression of the subject cell wherein unique gene expression or gene expression patterns provide markers for defining the cell type.