The present invention relates to methods of producing modifications in genes of interest in a cell. In particular, the invention provides methods for using nucleic acid sequence-modifying agents to introduce modifications in any gene of interest in the genome of a cell. Also provided are sets of cells which contain at least one modification in any gene of interest. The methods and compositions of the invention are useful in determining the function of the gene of interest.
With the completion of the Human Genome Program approaching, there is an increasing interest in studying the function of genes, particularly those involved in human development and disease. While mapping and nucleotide sequencing of genes is an important first step for understanding the function of genes, the physical characterization of the structure of a gene does not provide insight into the function of that gene in the context of a multicellular organism.
For example, prior art approaches to determining gene function in mammals have relied on targeting mutations to specific genes in embryonic stem (ES) cells, or on genome-wide mutagenesis techniques designed to mutate all genes of an organism (e.g., mice). For example, xe2x80x9cknock-outxe2x80x9d mutations in ES cells have been widely used to target mutations to specific genes. xe2x80x9cKnock-outxe2x80x9d mutations shut off or alter gene expression and are currently used to produce a phenotype in the whole animal which reflects the function of the knocked-out gene. This approach has identified many genes which are associated with cancer and other human genetic diseases, and relies either on phenotype-based screens (i.e., screening for a particular phenotype) or on gene-based screens (i.e., screening for a particular alteration in the genome). Phenotype-based screens have primarily been conducted using mice, and involve characterization of thousands of mutagenized mice for specific diseases and traits [Russell et al., Proc. Natl. Acad. Sci. USA 76:5818-5819, 1979; Hitotsumachi et al., Proc. Natl. Acad. Sci. USA 82:6619-6621; Shedlovsky et al., Genetics 134:1205-1210; Marker et al., Genetics 145:435-443, 1997]. While the phenotype-based approach has the advantage that no assumption is made with respect to which genes are associated with a given disease or disorder, it is nevertheless very costly when using organisms such as mice since it requires the maintenance of several lines of mutagenized whole organisms. Furthermore, it is unclear whether phenotype-based screens permit conducting saturation screens for both dominant and recessive mutations of all mouse genes.
Gene-based screens have been carried out in whole animals and in embryonic stem (ES) cells. This approach involves identifying the organism""s genes or the ES cell genes which have been mutated. Homologous recombination and retroviral insertion are commonly used in ES cells [Zambrowicz et al. (1998) Nature 392:608-611]. Although mutagenesis by homologous recombination is becoming routine, it remains cumbersome and expensive. Similarly, while the genome-wide approach to mutagenizing ES cells by retroviral insertional mutagenesis allows the generation of a large number of mutagenized ES cells in a cost effective manner, this approach produces only one, or a limited number of, alleles of a given gene. Additionally, the class of mutations that can be produced with this approach is limited to those mutations which result from integration of a retroviral element. Thus, mutations caused by, for example, single amino acid changes in the protein cannot be produced using this approach. In many instances, for example, it may be desirable to generate mutations which cause single amino acid changes that merely modify gene function (e.g., by generating hypomorphic alleles that express the gene with a reduced efficiency) or that give rise to a new trait in the animal (e.g., by generating dominant neomorphic alleles which result in a gain of function). The generation of hypomorphic and neomorphic alleles of a gene in a model organism by single amino acid substitutions may be desirable to create a model organism for a human trait or disease in which gene function is modified rather than destroyed.
Accordingly, what is needed are methods for determining gene function which may efficiently be applied on a genome-wide scale, which generate more than one mutation in a gene of interest, and which do not only abrogate the function of the gene.
The invention provides methods for generating an allelic series of modifications in any gene of interest contained in a cell using nucleic acid sequence-modifying agents. In particular, the invention provides a method of producing a modification in a gene of interest contained in a cell, comprising: a) providing: i) a plurality of target cells capable of being cultured; ii) an agent capable of producing at least one modification in the gene of interest in the target cell; b) treating the target cells with the agent under conditions such that a mixture of cells is produced, the mixture of cells comprising cells having an unmodified gene of interest and cells having a is modified gene of interest; and c) isolating the cells having a modified gene of interest.
In one preferred embodiment, the methods of the invention further comprise step d) comparing the nucleotide sequence of the gene of interest in the cells having a modified gene of interest with the nucleotide sequence of the gene of interest in the cells having an unmodified gene of interest. In a more preferred embodiment, the methods further comprise e) manipulating the cells having a modified gene of interest to generate an organism comprising the modification in the gene of interest. In an alternative more preferred embodiment, the method further comprises prior to step d) amplifying the modified gene of interest to produce an amplified modified gene of interest. In yet a more preferred embodiment, the method further comprises prior to step d) sequencing the amplified modified gene of interest.
Without intending to limit the methods of the invention to any particular modification, in one embodiment, the modification is selected from the group consisting of mutation, mismatch, and strand break. In a preferred embodiment, the mutation is selected from the group consisting of deletion, insertion and substitution. In another preferred embodiment, the strand break is selected from the group consisting of single-strand break and double-strand break. While it is not intended that the scope of the invention be limited to any particular type or source of target cell, in one embodiment, the target cell is derived from an organism selected from the group consisting of non-human animal, plant, protist, fungus, bacterium, and virus. In a preferred embodiment, the non-human animal is a mammal. In a more preferred embodiment, the mammal is a mouse. In an alternative preferred embodiment the non-human animal is zebrafish. In another embodiment, the target cell is an embryonic stem cell.
The invention is not intended to be limited to any particular type or class of agent capable of producing at least one modification in the gene of interest. However, in one preferred embodiment, the agent is selected from the group consisting of N-ethyl-N-nitrosurea, methylnitrosourea, procarbazine hydrochloride, triethylene melamine, acrylamide monomer, chlorambucil, melphalan, cyclophosphamide, diethyl sulfate, ethyl methane sulfonate, methyl methane sulfonate, 6-mercaptopurine, mitomycin-C, procarbazine, N-methyl-Nxe2x80x2-nitro-N-nitrosoguanidine, 3H2O, urethane, ultraviolet light, X-ray radiation, and gamma-radiation. The invention further provides a method of producing an allelic series of modification in a gene of interest contained in a cell, comprising: a) providing: i) a plurality of target cells capable of being cultured; ii) an agent capable of producing at least one modification in the gene of interest in the target cell; b) treating the target cells with the agent under conditions such that a mixture of cells is produced, the mixture of cells comprising cells having an unmodified gene of interest, cells having a first modification in the gene of interest, and cells having a second modification in the gene of interest; and c) isolating the cells having a first modification in the gene of interest and the cells having a second modification in the gene of interest, thereby producing an allelic series of modification in the gene of interest. In one preferred embodiment, the method further comprises step d) comparing the nucleotide sequence of the gene of interest in the cells having an unmodified gene of interest with the nucleotide sequence of the gene of interest in cells selected from the group consisting of the cells having a first modification in the gene of interest and the cells having a second modification in the gene of interest. In a more preferred embodiment, the method further comprises e) manipulating cells selected from the group consisting of the cells having a first modification in the gene of interest and the cells having a second modification in the gene of interest to generate an organism comprising a modification selected from the group consisting of the first modification in the gene of interest and the second modification in the gene of interest. In an alternative more preferred embodiment, the method further comprises prior to step d) amplifying the gene of interest selected from the group consisting of the gene of interest having the first modification and the gene of interest having the second modification too produce amplified modified gene of interest selected from the group consisting of amplified gene of interest having the first modification and amplified gene of interest having the second modification. In yet a more preferred embodiment, the method further comprises prior to step d) sequencing the amplified modified gene of interest.
Without limiting the invention to any particular class or type of modification, in an alternative preferred embodiment, the first modification and the second modification are selected from the group consisting of mutation, mismatch, and strand break. In a more preferred embodiment, the mutation is selected from the group consisting of deletion, insertion and substitution. In an alternative preferred embodiment, the strand break is selected from the group consisting of single-strand break and double-strand break. The invention is not limited to any particular type or source of target cell.
However, in one preferred embodiment, the target cell is derived from an organism selected from the group consisting of non-human animal, plant, protist, fungus, bacterium, and virus. In a more preferred embodiment, the non-human animal is a mammal. In yet a more preferred embodiment, the mammal is a mouse. In an alternative preferred embodiment, the non-human animal is zebrafish. In another preferred embodiment, the target cell is an embryonic stem cell. Without intending to limit the methods of the invention to any particular class or type of agent capable of producing at least one modification in the gene of interest, in one preferred embodiment, the agent is selected from the group consisting of N-ethyl-N-nitrosurea, methylnitrosourea, procarbazine hydrochloride, triethylene melamine, acrylamide monomer, chlorambucil, melphalan, cyclophosphamide, diethyl sulfate, ethyl methane sulfonate, methyl methane sulfonate, 6-mercaptopurine, mitomycin-C, procarbazine, N-methyl-Nxe2x80x2-nitro-N-nitrosoguanidine, 3H2O, urethane, ultraviolet light, X-ray radiation, and gamma-radiation.
The invention further provides a method of producing a modification in a gene of interest contained in a cell, comprising: a)-providing: i) a plurality of target cells capable of being cultured; ii) an agent capable of producing at least one modification in the gene of interest in the target cell; b) treating the target cells with the agent under conditions such that a mixture of cells is produced, the mixture of cells comprising a cell having an unmodified gene of interest and two or more cells having a modified gene of interest, the two or more cells having different modifications in the gene of interest; and c) isolating the two or more cells having a modified gene of interest.
In one embodiment, the method further comprises step d) comparing the nucleotide sequence of the gene of interest in the cells having a modified gene of interest with the nucleotide sequence of the gene of interest in the cells having an unmodified gene of interest. In a more preferred embodiment, the method further comprises e) manipulating the cells having a modified gene of interest to generate an organism comprising the modification in the gene of interest. In an alternative preferred embodiment, the method further comprises prior to step d) amplifying the modified gene of interest to produce an amplified modified gene of interest. In yet a more preferred embodiment, the method further comprises prior to step d) sequencing the amplified modified gene of interest.
While not intending to limit the invention to any particular type of modification, in one embodiment, the modification is selected from the group consisting of mutation, mismatch, and strand break. In a preferred embodiment, the mutation is selected from the group consisting of deletion, insertion and substitution. In an alternative preferred embodiment, the strand break is selected from the group consisting of single-strand break and double-strand break.
The invention is not limited to any particular type or source of target cell. However, in one embodiment, the target cell is derived from an organism selected from the group consisting of non-human animal, plant, protist, fungus, bacterium, and virus. In a preferred embodiment, the non-human animal is a mammal. In a more preferred embodiment, the mammal is a mouse. In an alternative preferred embodiment, the non-human animal is zebrafish. In another embodiment, target cell is an embryonic stem cell.
It is not intended that the invention be limited to the type or class of agent capable of producing at least one modification in the gene of interest. However, in one embodiment, the agent is selected from the group consisting of N-ethyl-N-nitrosurea, methylnitrosourea, procarbazine hydrochloride, triethylene melamine, acrylamide monomer, chlorambucil, melphalan, cyclophosphamide, diethyl sulfate, ethyl methane sulfonate, methyl methane sulfonate, 6-mercaptopurine, mitomycin-C, procarbazine, N-methyl-Nxe2x80x2-nitro-N-nitrosoguanidine, 3H2O, urethane, ultraviolet light, X-ray radiation, and gamma-radiation.
Definitions
To facilitate understanding of the invention, a number of terms are defined below.
The term xe2x80x9cgenomic sequencexe2x80x9d refers to any deoxyribonucleic acid sequence located in a cell. Genomic sequences include, but are not limited to, structural genes, regulatory genes, and regulatory elements.
A xe2x80x9ctransgenic organismxe2x80x9d as used herein refers to an organism whose germ line cells have been altered by the introduction of a transgene. The term xe2x80x9ctransgenexe2x80x9d as used herein refers to any nucleic acid sequence which is introduced into the genome of an organism by experimental manipulations. A transgene may be an xe2x80x9cendogenous DNA sequence,xe2x80x9d or a xe2x80x9cheterologous DNA sequencexe2x80x9d (i.e., xe2x80x9cforeign DNAxe2x80x9d). The term xe2x80x9cendogenous DNA sequencexe2x80x9d refers to a nucleotide sequence which is naturally found in the cell into which it is introduced so long as it does not contain some modification (e.g., a point mutation, the presence of a selectable marker gene, etc.) relative to the naturally-occurring sequence. The terms xe2x80x9cheterologous DNA sequencexe2x80x9d and xe2x80x9cforeign DNA sequencexe2x80x9d are used interchangeably herein to refer to a nucleotide sequence which is ligated to, or is manipulated to become ligated to, a nucleic acid sequence to which it is not ligated in nature, or to which it is ligated at a different location in nature. Heterologous DNA is not endogenous to the cell into which it is introduced, but has been obtained from another cell. Heterologous DNA also includes an endogenous DNA sequence which contains some modification relative to the endogenous DNA sequence. Generally, although not necessarily, heterologous DNA encodes RNA and proteins that are not normally produced by the cell into which it is expressed. Examples of heterologous DNA include reporter genes, transcriptional and translational regulatory sequences, selectable marker proteins (e.g., proteins which confer drug resistance), etc.
As used herein, the term xe2x80x9cgenexe2x80x9d means the deoxyribonucleotide sequences comprising the coding region of a structural gene and including sequences located adjacent to the coding region on both the 5xe2x80x2 and 3xe2x80x2 ends for a distance of several KB on either end such that the gene corresponds to the length of the full-length mRNA. The sequences which are located 5xe2x80x2 of the coding region and which are present on the mRNA are referred to as 5xe2x80x2 non-translated sequences. The sequences which are located 3xe2x80x2 or downstream of the coding region and which are present on the mRNA are referred to as 3xe2x80x2 non-translated sequences. A genomic form or clone of a gene contains coding sequences, termed exons, alternating with non-coding sequences termed xe2x80x9cintronsxe2x80x9d or xe2x80x9cintervening regionsxe2x80x9d or xe2x80x9cintervening sequences.xe2x80x9d Introns are segments of a gene which are transcribed into heterogenous nuclear RNA (hnRNA); introns may contain regulatory elements such as enhancers. Introns are removed or xe2x80x9cspliced outxe2x80x9d from the nuclear or primary transcript; introns therefore are absent in the messenger RNA (mRNA) transcript. In addition to containing introns, genomic forms of a gene may also include sequences located on both the 5xe2x80x2 and 3xe2x80x2 end of the sequences which are present on the RNA transcript. These sequences are referred to as xe2x80x9cflankingxe2x80x9d sequences or regions (these flanking sequences are located 5xe2x80x2 or 3xe2x80x2 to the non-translated sequences present on the mRNA transcript). The 5xe2x80x2 flanking region may contain regulatory sequences such as promoters and enhancers which control or influence the transcription of the gene. The 3xe2x80x2 flanking region may contain sequences which direct the termination of transcription, posttranscriptional cleavage and polyadenylation.
As used herein the term xe2x80x9ccoding regionxe2x80x9d when used in reference to a structural gene refers to the nucleotide sequences which encode the amino acids found in the nascent polypeptide as a result of translation of an mRNA molecule. The coding region is bounded, in eukaryotes, on the 5xe2x80x2 side by the nucleotide triplet xe2x80x9cATGxe2x80x9d which encodes the initiator methionine and on the 3xe2x80x2 side by one of the three triplets which specify stop codons (i.e., TAA, TAG, TGA).
As used herein, the term xe2x80x9cstructural genexe2x80x9d refers to a DNA sequence coding for RNA or a protein. In contrast, xe2x80x9cregulatory genesxe2x80x9d are structural genes which encode products (e.g., transcription factors) which control the expression of other genes.
As used herein, the term xe2x80x9cregulatory elementxe2x80x9d refers to a genetic element which controls some aspect of the expression of nucleic acid sequences. For example, a promoter is a regulatory element which facilitates the initiation of transcription of an operably linked coding region. Other regulatory elements are splicing signals, polyadenylation signals, termination signals, enhancer elements, etc. Promoters and enhancers consist of short arrays of DNA sequences that interact specifically with cellular proteins involved in transcription [Maniatis, et al., Science 236:1237 (1987)]. Promoter and enhancer elements have been isolated from a variety of eukaryotic sources including genes in yeast, insect and mammalian cells and viruses (analogous control elements, i.e., promoters, are also found in prokaryotes). The selection of a particular promoter and enhancer depends on what cell type is to be used to express the protein of interest. Some eukaryotic promoters and enhancers have a broad host range while others are functional in a limited subset of cell types [for review see Voss, et al., Trends Biochem. Sci., 11:287 (1986) and Maniatis, et al., Science 236:1237 (1987)]. For example, the SV40 early gene enhancer is very active in a wide variety of cell types from many mammalian species and has been widely used for the expression of proteins in mammalian cells [Dijkema, et al., EMBO J. 4:761 (1985)]. Other examples of promoter/enhancer elements active in a broad range of mammalian cell types are those from the human elongation factor 1xcex1 gene-[Uetsuki et al., J. Biol. Chem., 264:5791 (1989); Kim et al., Gene 91:217 (1990); and Mizushima and Nagata, Nuc. Acids. Res., 18:5322 (1990)] and the long terminal repeats of the Rous sarcoma virus [Gorman et al., Proc. Natl. Acad. Sci. USA 79:6777 (1982)] and the human cytomegalovirus [Boshart el al, Cell 41:521 (1985)].
The terms xe2x80x9cgene of interestxe2x80x9d and xe2x80x9cnucleotide sequence of interestxe2x80x9d refer to any gene or nucleotide sequence, respectively, the manipulation of which may be deemed desirable for any reason by one of ordinary skill in the art.
The term xe2x80x9cexpression vectorxe2x80x9d as used herein refers to a recombinant DNA molecule containing a desired coding sequence and appropriate nucleic acid sequences necessary for the expression of the operably linked coding sequence in a particular host organism. Nucleic acid sequences necessary for expression in prokaryotes include a promoter, optionally an operator sequence, a ribosome binding site and possibly other sequences. Eukaryotic cells are known to utilize promoters, enhancers, and termination and polyadenylation signals.
A xe2x80x9cmodificationxe2x80x9d as used herein in reference to a nucleic acid sequence refers to any change in the structure of the nucleic acid sequence. Changes in the structure of a nucleic acid sequence include changes in the covalent and non-covalent bonds in the nucleic acid sequence. Illustrative of these changes are mutations, mismatches, strand breaks, as well as covalent and non-covalent interactions between a nucleic acid sequence (which contains unmodified and/or modified nucleic acids) and other molecules. Illustrative of a covalent interaction between a nucleic acid sequence and another molecule are changes to a nucleotide base (e.g., formation of thymine glycol) and covalent cross-links between double-stranded DNA sequences which are introduced by, for example, ultraviolet radiation or by cis-platinum. Yet another example of a covalent interaction between a nucleic acid sequence and another molecule includes covalent binding of two nucleic acid sequences to psoralen following ultraviolet irradiation. Non-covalent interactions between a nucleic acid sequence and another molecule include non-covalent interactions of a nucleic acid sequence with a molecule other than a nucleic acid sequence and other than a polypeptide sequence. Non-covalent interactions between a nucleic acid sequence with a molecule other than a nucleic acid sequence and other than a polypeptide sequence are illustrated by non-covalent intercalation of ethidium bromide or of psoralen between the two strands of a double-stranded deoxyribnucleic acid sequence. The present invention contemplates modifications which cause changes in a functional property (or properties), such changes manifesting themselves at a variety of time points.
The term xe2x80x9callelic seriesxe2x80x9d when made in reference to a gene refers to wild-type sequences of the gene. An xe2x80x9callelic series of modificationsxe2x80x9d as used herein in reference to a gene refers to two or more nucleic acid sequences of the gene, where each of the two or more nucleic acid sequences of the gene contains at least one modification when compared to the wild-type sequences of the gene.
As used herein, the term xe2x80x9cmutationxe2x80x9d refers to a deletion, insertion, or substitution. A xe2x80x9cdeletionxe2x80x9d is defined as a change in a nucleic acid sequence in which one or more nucleotides is absent. An xe2x80x9cinsertionxe2x80x9d or xe2x80x9cadditionxe2x80x9d is that change in a nucleic acid sequence which has resulted in the addition of one or more nucleotides. A xe2x80x9csubstitutionxe2x80x9d results from the replacement of one or more nucleotides by a molecule which is a different molecule from the replaced one or more nucleotides. For example, a nucleic acid may be replaced by a different nucleic acid as exemplified by replacement of a thymine by a cytosine, adenine, guanine, or uridine. Alternatively, a nucleic acid may be replaced by a modified nucleic acid as exemplified by replacement of a thymine by thymine glycol.
The term xe2x80x9cmismatchxe2x80x9d refers to a non-covalent interaction between two nucleic acids, each nucleic acid residing on a different polynucleic acid sequence, which does not follow the base-pairing rules. For example, for the partially complementary sequences 5xe2x80x2-AGT-3xe2x80x2 and 5xe2x80x2-AAT-3xe2x80x2, a G-A mismatch is present.
The term xe2x80x9cstrand breakxe2x80x9d when made in reference to a double stranded nucleic acid sequence includes a single-strand break and/or a double-strand break. A single-strand break refers to an interruption in one of the two strands of the double stranded nucleic acid sequence. This is in contrast to a double-strand break which refers to an interruption in both strands of the double stranded nucleic acid sequence. Strand breaks may be introduced into a double stranded nucleic acid sequence either directly (e.g., by ionizing radiation) or indirectly (e.g., by enzymatic incision at a nucleic acid base).
The terms xe2x80x9cnucleic acidxe2x80x9d and xe2x80x9cunmodified nucleic acidxe2x80x9d as used herein refer to any one of the known four deoxyribonucleic acid bases (i.e., guanine, adenine, cytosine, and thymine). The term xe2x80x9cmodified nucleic acidxe2x80x9d refers to a nucleic acid whose structure is altered relative to the structure of the unmodified nucleic acid. Illustrative of such modifications would be replacement covalent modifications of the bases, such as alkylation of amino and ring nitrogens as well as saturation of double bonds.
The term xe2x80x9cmodified cellxe2x80x9d refers to a cell which contains at least one modification in the cell""s genomic sequence.
The term xe2x80x9cnucleic acid sequence-modifying agentxe2x80x9d refers to an agent which is capable of introducing at least one modification into a nucleic acid sequence. Nucleic acid sequence-modifying agents include, but are not limited to, chemical compounds [e.g., N-ethyl-N-nitrosurea (ENU), methylnitrosourea (MNU), procarbazine hydrochloride (PRC), triethylene melamine (TEM), acrylamide monomer (AA), chlorambucil (CHL), melphalan (MLP), cyclophosphamide (CPP), diethyl sulfate (DES), ethyl methane sulfonate (EMS), methyl methane sulfonate (MMS), 6-mercaptopurine (6 MP), mitomycin-C (MMC), procarbazine (PRC), N-methyl-Nxe2x80x2-nitro-N-nitrosoguanidine (MNNG), 3H2O, and urethane (UR)], and electromagnetic radiation [e.g., X-ray radiation, gamma-radiation, ultraviolet light].
The term xe2x80x9cwild-typexe2x80x9d when made in reference to a gene refers to a gene which has the characteristics of that gene when isolated from a naturally occurring source. A wild-type gene is that which is most frequently observed in a population and is thus arbitrarily designated the xe2x80x9cnormalxe2x80x9d or xe2x80x9cwild-typexe2x80x9d form of the gene. In contrast, the term xe2x80x9cmodifiedxe2x80x9d or xe2x80x9cmutantxe2x80x9d refers to a gene or gene product which displays modifications in sequence and/or functional properties (i.e., altered characteristics) when compared to the wild-type gene or gene product. It is noted that naturally-occurring mutants can be isolated; these are identified by the fact that they have altered characteristics when compared to the wild-type gene or gene product.