1. Technical Field
The invention relates to methods and materials involved in identifying and isolating a nucleic acid molecule that contains an open reading frame.
2. Background Information
The genomes of higher organisms such as most crop and livestock species as well as the human genome are complex and contain greater than 90% non-genic sequences. In such cases, genes have been identified by cloning mRNA species as cDNAs into plasmid vectors to form a cDNA library. The cDNA library is then analysed for the presence of open reading frames, regions of polynucleotides that encode proteins. This technique is refered to as the EST (expressed sequence tag) approach. Although theoretically a cDNA library should represent all genes that are expressed by a cell at a given time, in practice, the library is biased for genes expressed at high levels. Those genes that are highly expressed or those that are expressed under xe2x80x9cstandardxe2x80x9d conditions are well represented in the cellular mRNA pool, will be well represented in the cDNA library and so will be readily identified. Those genes that are expressed at low levels, however, are poorly represented in the cellular mRNA pool and may not be recovered. Furthermore, genes expressed under xe2x80x9cunusualxe2x80x9d conditions would not be recovered if these unusual conditions cannot be duplicated in the laboratory. In contrast to the cellular mRNA pool, all genes are represented in equi-molar concentrations in the genome. For this reason, a genomic DNA library is more advantageous than a cDNA library for gene discovery if a method can be found for differentiating clones containing genic sequences from those containing nongenic sequences.
The invention involves materials and methods for identifying nucleotide fragments that contain uninterrupted open reading frames (ORFs). The materials include isolated nucleic acid molecules that encode histidine tags in each of the three possible reading frames. A histidine tag is defined as a sequence of three or more consecutive histidine amino acid residues. A DNA sequence that codes for histidine tags in all three possible reading frames is referred to as a 3-frame His-tag DNA sequence. The isolated nucleic acid molecules can be of any length, but typically are less than 500 nucleotides in length for example, less than 200, 150, or 100 nucleotides in length. In some cases, they can be greater than 500 nucleotides in length. The sequences of two representative nucleic acid molecules that encode histidine tags in each of the three reading frames are given.
The invention also includes vectors containing the above described 3-frame His-tag encoding DNA sequences. These vectors are plasmid, phage DNA or other DNA molecules that are able to replicate in a host cell. These vectors may have a selectable marker and any necessary expression control sequences. Such control sequences include, for example, promoters that allow for expression of an ORF in nucleotide sequences operably linked to these promoters.
The vectors may also have multiple cloning sites (MCS) located 3xe2x80x2,5xe2x80x2, or 3xe2x80x2 and 5xe2x80x2 of the 3-frame His-tag coding sequence for expression of 3xe2x80x2 or 5xe2x80x2 histidine tagged polypeptides.
Other embodiments of the invention include cultured cells containing vectors having a 3-frame His-tag coding sequence. The cells can be prokaryotic or eukaryotic, for example, yeast cells, bacterial cells, plant cells and animal cells.
The invention can be used for determining the presence or absence of an open reading frame in any nucleic acid molecule. The nucleic acid molecule is inserted in a vector having a 3-frame His-tag coding sequence, either 3xe2x80x2 or 5xe2x80x2 of the 3-frame His-tag sequence. The vector is introduced into a host cell and the host cell is then cultured under conditions that allow for expression of the cloned nucleic acid molecule. The presence or absence of an open reading frame in the nucleic acid molecule of interest is then indicated by the presence or absence of a histidine tagged polypeptide encoded by the nucleic acid molecule and produced by the host cell. The advantage of this method is that if a gene exists in a nucleic acid molecule, it will be expressed with a histidine tag regardless of its reading frame in the nucleic acid molecule. Furthermore, this method allows for identification of new genes from cDNAs, ESTs, or genomic DNA. The advantage of using genomic DNA as a source for new gene discovery is the ability to recover genes that are expressed in low amounts or in conditions that may not be reproducible in the laboratory. In addition, since most genes are represented in equimolar amounts in the genome, they are more equally likely to be identified than through use of cDNA libraries derived from cellular mRNA pools.
In another embodiment, the invention allows for recovery of the corresponding polypeptide encoded by the newly identified gene without prior knowledge of the biochemical properties of the polypeptide, its activity or even characteristics of its gene sequence. Once a nucleic acid molecule is determined as encoding an ORF in the method described above, the histidine tagged ORF can be purified by affinity purification using a Ni-NTA (nickel-nitrilotriacetic acid) substrate.
In yet another embodiment, the 3-frame His-tag DNA sequence of this invention is used in activation tagging vectors. An activation tagging vector containing a 3-frame His-tag coding sequence can be introduced into an organism and allowed to randomly insert into the genome. The organism is then analysed for a change in phenotype. The gene associated with the phenotype is then isolated from other genomic DNA fragments based on its proximity to the 3-frame His-tag sequence. The function of the gene can be elucidated by analysis of the phenotype associated with the insertion event. The invention also provides for the complement of the 3-frame His-tag sequence that can be used for identification of DNA fragments containing the 3-frame His-tag sequence.
The term xe2x80x9cnucleic acidxe2x80x9d as used herein encompasses RNA and DNA, including cDNA, genomic DNA, and synthetic (e.g., chemically synthesized) DNA. The nucleic acid can be double-stranded or single-stranded. Where single-stranded, the nucleic acid can be the sense strand or the antisense strand. In addition, nucleic acid can be circular or linear.
The term xe2x80x9cisolatedxe2x80x9d as used herein with reference to nucleic acid refers to a naturally-occurring nucleic acid that is not immediately contiguous with both of the sequences with which it is immediately contiguous (one on the 5xe2x80x2 end and one on the 3xe2x80x2 end) in the naturally-occurring genome of the organism from which it is derived. For example, an isolated nucleic acid can be, without limitation, a recombinant DNA molecule of any length, provided one of the nucleic acid sequences normally found immediately flanking that recombinant DNA molecule in a naturally-occurring genome is removed or absent. Thus, an isolated nucleic acid includes, without limitation, a recombinant DNA that exists as a separate molecule (e.g., a cDNA or a genomic DNA fragment produced by PCR or restriction endonuclease treatment) independent of other sequences as well as recombinant DNA that is incorporated into a vector. In addition, an isolated nucleic acid can include a recombinant DNA molecule that is part of a hybrid or fusion nucleic acid sequence.
The term xe2x80x9cisolatedxe2x80x9d as used herein with reference to nucleic acid also includes any non-naturally-occurring nucleic acid since non-naturally-occurring nucleic acid sequences are not found in nature and do not have immediately contiguous sequences in a naturally occurring genome. For example, non-naturally-occurring nucleic acid such as an engineered nucleic acid is considered to be isolated nucleic acid. Engineered nucleic acid can be made using common molecular cloning or chemical nucleic acid synthesis techniques. Isolated non-naturally-occurring nucleic acid can be independent of other sequences, or incorporated into a vector. In addition, a non-naturally-occurring nucleic acid can include a nucleic acid molecule that is part of a hybrid or fusion nucleic acid sequence.
It will be apparent to those of skill in the art that a nucleic acid existing among hundreds to millions of other nucleic acid molecules within, for example, cDNA or genomic libraries, or gel slices containing a genomic DNA restriction digest is not to be considered an isolated nucleic acid.
The term xe2x80x9coperably linkedxe2x80x9d as used herein, means a ftnctional linkage between the expression control sequence and the coding sequence to which it is linked. The operable linkage permits the expression control sequence to control expression of the coding sequence. Expression control sequences can include a promoter, a transcriptional activator binding sequence, an enhancer sequence or any other regulatory or non-regulatory sequence that may be required for transcription and translation of the coding sequence to which the expression control sequence is linked.
Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention pertains. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, suitable methods and materials are described below. All publications, patent applications, patents, and other references mentioned herein are incorporated by reference in their entirety. In case of conflict, the present specification, including definitions, will control. In addition, the materials, methods, and examples are illustrative only and not intended to be limiting.
Other features and advantages of the invention will be apparent from the following detailed description, and from the claims.