The present invention relates in general to the field of recombinant nucleic acids, polypeptides and other derived materials and, more particularly, to the identification, isolation and characterization of human transcription factors that are involved in the expression of human genes.
Without limiting the scope of the invention, its background is described in connection with the isolation, characterization and use of human transcription factors that are expressed throughout the organism, as an example.
Unlike the nucleic acid polymerases of prokaryotes, purified RNA polymerase II from eukaryotes initiates transcription very poorly and essentially at random. One key difference between prokaryotic and eukaryotic polymerases is the need for accessory factors that provide for the accurate initiation of transcription. These factors are referred to as the xe2x80x9cgeneralxe2x80x9d or xe2x80x9cbasalxe2x80x9d transcription factors, in that they are required, in addition to RNA polymerase II, for the transcription of all eukaryotic protein coding genes. As such, the general transcription factors are expected to be active, or at least present, in all or most tissues. One such general factor is called transcription factor IID (TFIID) and is responsible in large part for promoter recognition. Other general transcription factors include TFIIA, TFIIB, TFIIE, TFIIF and TFIIH.
Appropriate levels of gene- and tissue-specific transcription is achieved by another set of factors called activator proteins. These factors are often composed of two domains, a sequence-specific DNA recognition domain and an activation domain. When bound to DNA, the activation domain facilitates the formation and function of a preinitiation complex that consists of the general transcription factors and RNA polymerase II. In this way it is possible to direct the selective transcription of genes in an appropriately regulated fashion.
The structure of a typical promoter for a eukaryotic gene consists of two general regions. The core promoter is located at or near the actual site of transcription initiation and often includes a TATA sequence element located at about 30 base pairs upstream of the initiation site. The other regions are defined as sequence elements which are recognized by activator proteins. These are often located at various distances further upstream, but may be also be located downstream relative to the core promoter of the gene being regulated. Interactions between bound regulatory factors and the preinitiation complex are responsible for the precisely regulated transcription of each individual gene.
TFIIA is an essential general transcription factor and the purified factor from higher eukaryotes consists of three subunits, designated alpha (35 kD), beta (19 kD) and gamma (12 kD). In humans, the alpha and beta subunits are encoded by DNA sequences present in the TFIIAxcex1/xcex2 cDNA, sometimes referred to as the xe2x80x98largexe2x80x99 subunit cDNA. These two subunits are post-translationally processed from a large 55 kD product of TFIIAxcex1/xcex2. The gamma subunit is encoded by DNA sequences present in the TFIIAxcex3 cDNA, sometimes referred to as the xe2x80x98smallxe2x80x99 subunit cDNA. This sequence is the subject of U.S. Pat. No. 5,562,117 issued to Moore and Rosen. TFIIA has multiple roles in transcription initiation by RNA polymerase II, including an ability to stabilize TBP-TATA element interactions, displace TBP-associated repressors and serve as a cofactor during the processes of transcription activation.
Most of the known human general transcription factors appear to be generally required in all tissues for gene expression by RNA polymerase II. Thus, these factors will be important as markers to evaluate disease states which may arise from inappropriately regulated gene expression and as pharmacological reagents and/or targets with which to modulate patterns of gene expression. Similarly, overexpression via gene therapy or other means should have broad effects on the expression of many or all cellular genes. In contrast, mutations in the genes for activator proteins, which are normally observed to control expression of a select set of genes, often in a tissue or developmentally restricted pattern, typically result in specific defects. Likewise, overexpression of activator proteins only affects expression of cellular genes which contain cognate recognition sequences.
Testis has important endocrine (hormonal) functions and is the site for the production of haploid spermatozoa from undifferentiated stem cells, a process called spermatogenesis. Mutations in some specialized transcriptional activator proteins, such as A-myb and CREM, cause male infertility and show defects in spermatogenesis. The identification of tissue-specific human general transcription factor would bridge an important gap between the generality for general transcription factor function and the specificity of gene-specific transcriptional activator protein function. If such factors were testis-specific, they would be expected to regulate patterns of gene expression that are important in the endocrine, spermatogenic and other functions of this organ. The present invention satisfies a need in the art for new compositions for polynucleotide sequences and encoded polypeptide products, immunological reagents and other derived materials in terms of providing unique reagents for the detection of defects in testis function such as idiopathic male infertility or other syndromes, for detection of dysfunctional patterns of gene expression and as reagents that can modulate gene expression.
The present invention includes DNA sequences that encode two structurally distinct isoforms of the human general transcription factor TFIIA xcex1/xcex2. One of these sequences is denoted as ALF, for TFIIA xcex1/xcex2-like factor, which is expressed predominantly in human testis. The second sequence contains ALF connected to a unique upstream sequence and is denoted as SALF, for Stoned B/TFIIA xcex1/xcex2-like factor. The present invention is also direct to recombinant polypeptide products and other derived materials. The uses of the invention include, but are not necessarily limited to, the propagation and preparation of the ALF and SALF DNA, RNA and recombinant proteins, and use of these materials as reagents and markers to detect and/or modify the function of eukaryotic cells in normal and disease states.
The present invention may be used in the detection of the endogenous ALF and SALF RNAs in eukaryotic cells using hybridization, polymerase chain reactions, immunological analysis and other methods. The invention may also be used along with the endogenous ALF and SALF DNAs, RNAs and proteins as specific in vivo pharmacological targets to artificially modulate the expression of eukaryotic genes. Furthermore, the ALF, SALF and the variable carboxyl terminal end may be introduced in a normal or modified versions of the ALF and SALF genes for expression in eukaryotic cells in order to replace or augment endogenous transcription factor activities (gene therapy). The present invention may also be used as testis-specific antigens for contraceptive vaccine development.
The present invention, in a general and overall sense, concerns the isolation and characterization of a novel transcriptional factor gene, ALF and carboxy terminal variable region. One embodiment of the present invention is a purified nucleic acid segment that encodes a protein having an amino acid sequence as shown in FIG. 2, in accordance with SEQ ID NO.:2. Another embodiment of the present invention is a purified nucleic acid 25 segment, further defined as including a nucleotide sequence in accordance with SEQ ID NO.:1.
The present invention also concerns the isolation and characterization of a novel transcriptional factor gene, SALF and a carboxy terminal variable region. One embodiment of the present invention is a purified nucleic acid segment that encodes a protein having an amino acid sequence as shown in FIG. 3, in accordance with SEQ ID NO.:4. Another embodiment of the present invention is a purified nucleic acid segment, further defined as including a nucleotide sequence in accordance with SEQ ID NO.:3. The 3xe2x80x2 variable region that ALF and SALF have in common is encoded by the nucleic acid segment in accordance with SEQ ID NO.:5 and expressed as an amino acid sequence as shown in SEQ ID NO.:6.
In one embodiment the purified nucleic acid segment includes the nucleotide sequence of SEQ ID NOS.:1, 3 and 5. As used herein, the term xe2x80x9cnucleic acid segmentxe2x80x9d and xe2x80x9cDNA segmentxe2x80x9d are used interchangeably and refer to a DNA molecule that has been isolated free of total genomic DNA of a particular species. Therefore, a xe2x80x9cpurifiedxe2x80x9d DNA or nucleic acid segment as used herein, refers to a DNA segment that includes novel transcriptional factor genes, ALF, SALF and a carboxy terminal variable coding sequence, yet is isolated away from, or purified free from, total genomic DNA, for example, total cDNA or human genomic DNA. Included within the term xe2x80x9cDNA segmentxe2x80x9d, are DNA segments and smaller fragments of such segments and recombinant vectors, including, for example, plasmids, cosmids, phage, viruses and the like.
Similarly, a DNA segment encoding an isolated or purified novel transcriptional factor genes, ALF, SALF and a carboxy terminal variable coding sequence, gene refers to a DNA segment including ALF, SALF and a carboxy terminal variable coding sequence isolated substantially away from other naturally occurring genes or protein encoding sequences. In this respect, the term xe2x80x9cgenexe2x80x9d is used for simplicity to refer to a functional protein, polypeptide or peptide encoding unit. As will be understood by those in the art, this functional term includes both genomic sequences, cDNA sequences or combinations thereof. xe2x80x9cIsolated substantially away from other coding sequencesxe2x80x9d means that the gene of interest, in this case ALF, SALF and a carboxy terminal variable coding sequence, forms the significant part of the coding region of the DNA segment. Of course, this refers to the DNA segment as originally isolated and does not exclude genes or coding regions later added by the hand of man to the segment.
In particular embodiments, the invention concerns isolated DNA segments and recombinant vectors incorporating DNA sequences that encode novel transcriptional factor genes, ALF, SALF and a carboxy terminal variable coding sequence genes, and that include within the amino acid sequence an amino acid sequence in accordance with SEQ ID NO.:2.
Moreover, in other particular embodiments, the invention concerns isolated DNA segments and recombinant vectors incorporating DNA sequences that encode a gene which includes within its amino acid sequence the amino acid sequence of a ALF, SALF and a carboxy terminal variable coding sequence.
Another embodiment of the present invention is a purified nucleic acid segment that encodes proteins in accordance with SEQ ID NOS.:2, 4 and 6, further defined as a recombinant vectors. As used herein the term, xe2x80x9crecombinant vectorxe2x80x9d, refers to a vector that has been modified to contain a nucleic acid segment that encodes ALF, SALF, or the carboxy terminal variable coding sequence protein, or a fragment thereof. The recombinant vector may be further defined as an expression vector that includes a promoter operatively linked to the ALF, SALF, or the ALF/SALF variants having the carboxy terminal variable coding sequence encoding a nucleic acid segment.
A further embodiment of the present invention is a host cell, made recombinant with a recombinant vector including ALF, or SALF, and if present, a carboxy terminal variable coding sequence. The recombinant host cell may be a prokaryotic cell. In a one embodiment, the recombinant host cell is a eukaryotic cell. As used herein, the term xe2x80x9cengineeredxe2x80x9d or xe2x80x9crecombinantxe2x80x9d cell is intended to refer to a cell into which a recombinant gene, such as a gene encoding ALF, SALF, or the carboxy terminal variable coding sequence, has been introduced. Therefore, engineered cells are distinguishable from naturally occurring cells which do not contain a recombinantly introduced gene. Engineered cells are thus cells having a gene or genes introduced through the hand of man. Recombinantly introduced genes will either be in the form of a cDNA, a copy of a genomic gene, or will include genes positioned adjacent to a promoter not naturally associated with the particular introduced gene.
It may be more convenient, however, to employ as the recombinant gene a cDNA version of the gene. One advantage of working with cDNAs is that the size of the gene is generally smaller and more readily employed to introduce into or xe2x80x9ctransfectxe2x80x9d the targeted cell than will a genomic gene; typically an order of magnitude larger than cDNA gene.
Alternatively, a genomic version of a particular gene may be used where desired.
In certain embodiments, the invention concerns isolated DNA segments and recombinant vectors that encode a protein or peptide which includes within its amino acid sequence an amino acid sequence essentially as set forth in SEQ ID NOS.:2, 4 or 6.
Naturally, where the DNA segment or vector encodes a full length ALF or SALF protein, or is intended for use in expressing the sequences will be as essentially as set forth in SEQ ID NOS.:2,4and6.
The term xe2x80x9ca sequence essentially as set forth in SEQ ID NO.:2xe2x80x9d means that the sequence substantially corresponds to a portion of SEQ ID NO.:2 and has relatively few amino acids which are not identical to, or a biologically functional equivalent of, the amino acids of SEQ ID NO.:2. Likewise the phrase is equally applied to SEQ ID NOS.:4 and 6.
The term xe2x80x9cbiologically functional equivalentxe2x80x9d is well understood in the art and is further defined in detail herein as a gene having a sequence essentially as set forth in SEQ ID NOS.:2, 4 or 6, and that is associated with RNA transcription. Accordingly, sequences that have between about 70% and about 80%; or between about 81% and about 90%; or even between about 91% and about 99%; of amino acids that are identical or functionally equivalent to the amino acids of SEQ ID NOS.:2, 4 or 6.
In certain other embodiments, the invention concerns isolated DNA segments and recombinant vectors that include within their sequence a nucleic acid sequence essentially as set forth in SEQ ID NOS.:1, 3 or 5. The term xe2x80x9cessentially as set forth in SEQ ID NO.:1,xe2x80x9d is used in the same sense as described above and means that the nucleic acid sequence substantially corresponds to a portion of SEQ ID NO.:1, and has relatively few codons that are not identical, or functionally equivalent, to the codons of SEQ ID NO.:1. Likewise the phrase is equally applied to SEQ ID NOS.:3 and 5. The functionally equivalent codons are known in the art.
It will also be understood that amino acid and nucleic acid sequences may include additional residues, such as additional N- or C-terminal amino acids or 5xe2x80x2 or 3xe2x80x2 sequences, and yet still be essentially as set forth in one of the sequences disclosed herein, so long as the sequence meets the criteria set forth above, including the maintenance of biological protein activity where protein expression is concerned. The addition of terminal sequences particularly applies to nucleic acid sequences that may, for example, include various non-coding sequences flanking either of the 5xe2x80x2 or 3xe2x80x2 portions of the coding region or may include various internal sequences, i.e., introns, which are known to occur within genes.
Excepting intronic or flanking regions, and allowing for the degeneracy of the genetic code, sequences that have between about 70% and about 80%; or between about 80% and about 90%; or between about 90% and about 99%; of nucleotides that are identical to the nucleotides of SEQ ID NOS.:1, 3 or 5 will be sequences that are xe2x80x9cessentially asxe2x80x9d the respective SEQ ID NOS. Sequences that are essentially the same as those set forth in SEQ ID NOS.:1, 3 or 5 may also be functionally defined as sequences that are capable of hybridizing to a nucleic acid segment containing the complement of SEQ ID NO.:1 under relatively stringent conditions. Suitable relatively stringent hybridization conditions will be well known to those of skill in the art and are clearly set forth herein, for example conditions for use with southern and northern blot analysis as described herein.
Naturally, the present invention also encompasses DNA segments that are complementary, or essentially complementary, to the sequence set forth in SEQ ID NOS.:1, 3 or 5. The nucleic acid segments of the present invention, regardless of the length of the coding sequence itself, may be combined with other DNA sequences, such as promoters, polyadenylation signals, additional restriction enzyme sites, multiple cloning sites, other coding segments, and the like, such that their overall length may vary considerably. A nucleic acid fragment of almost any length may be employed, with the total length being limited by the ease of preparation and use in the intended recombinant DNA protocol. For example, nucleic acid fragments may be prepared that include a short stretch complementary to SEQ ID NOS.:1, 3 or 5, such as about 10 to 15 or 20, 30, or 40 or so nucleotides, and which are up to 10,000 or 5,000 base pairs in length, with segments of 3,000 being used in certain cases. DNA segments with total lengths of about 1,000, 500, 200, 100 and about 50 base pairs in length are also useful.
Another embodiment of the present invention is a nucleic acid segment that includes at least a 14-nucleotide long stretch that corresponds to, or is complementary to, the nucleic acid sequence of SEQ ID NOS.:1, 3 or 5. In one embodiment the nucleic acid is further defined as including at least a 20, 30, 50, 100, 200, 500, 1000, or at least a 3824 nucleotide long stretch that corresponds to, or is complementary with, the nucleic acid sequence of SEQ ID NOS.:1, 3 or 5. The nucleic acid segment may be further defined as having the nucleic acid sequence of SEQ ID NOS.:1, 3 or 5.
A related embodiment of the present invention is a nucleic acid segment that includes at least a 14-nucleotide long stretch that corresponds to, or is complementary with, the nucleic acid sequence of SEQ ID NO.:1 or 3, further defined as including a nucleic acid fragment of up to 10,000 base pairs in length. Another embodiment is a nucleic acid fragment including from 14 nucleotides of SEQ ID NO.:1 or 3 up to 5,000, 3,000, 1,000, 500 or 100 base pairs in length.
Naturally, it will also be understood that this invention is not limited to the particular nucleic acid and amino acid sequences of SEQ ID NOS.:2, 4 and 6. Recombinant vectors and isolated DNA segments may therefore variously include the ALF, SALF and variable region coding regions themselves, coding regions bearing selected alterations or modifications in the basic coding region, or they may encode larger polypeptides that nevertheless include ALF, SALF or variable region-coding segments or may encode biologically functional equivalent proteins or peptides that have variant amino acids sequences.
The DNA segments of the present invention encompass biologically functional equivalent ALF, SALF and variable region peptides. Such sequences may arise as a consequence of codon redundancy and functional equivalency that are known to occur naturally. Alternatively, functionally equivalent proteins or peptides may be created via the application of recombinant DNA technology, where changes in the protein structure may be engineered, based on considerations of the properties of the amino acids being exchanged.
Changes designed by man may be introduced through the application of site-directed mutagenesis techniques, e.g., to introduce improvements to the antigenicity of the ALF, SALF or variable region mutants in order to examine transcriptional activity or determine the presence of ALF, SALF or variable region protein in various cells and tissues at the molecular level.
Another embodiment of the present invention is a purified composition comprising a polypeptide having an amino acid sequence in accordance with SEQ ID NOS.:2, 4 or 2 or 4 with 6. The term xe2x80x9cpurifiedxe2x80x9d as used herein, refers to a transcriptional factor protein composition, wherein the ALF, SALF or ALF and SALF having the variable region proteins are purified to any degree relative to its naturally-obtainable state, i.e., in this case, relative to its purity within a eukaryotic cell extract, or a testis sample. A cell for the isolation of ALF, SALF or variants thereof is a cell of testicular origin, however, these proteins may also be isolated from patient specimens, recombinant cells, tissues, isolated subpopulations of tissues, and the like, as will be known to those of skill in the art, in light of the present disclosure. Purified ALF, SALF or variants thereof also refer to polypeptides having the amino acid sequence of SEQ ID NOS.:2, 4, 2 and 6 or 4 and 6, free from the environment in which it may naturally occur. One may also prepare fusion proteins and peptides, e.g., where the ALF, SALF or variable portion coding regions are aligned within the same expression unit with other proteins or peptides having desired functions, such as for purification or immunodetection purposes (e.g., proteins that may be purified by affinity chromatography and enzyme label coding regions, respectively).
Turning to the expression of ALF, SALF and variable genes whether from cDNA or genomic DNA, protein may be prepared using an expression system to make recombinant preparations of ALF, SALF and variable genes proteins. The engineering of DNA segment(s) for expression in a prokaryotic or eukaryotic system may be performed by techniques generally known to those of skill in recombinant expression. For example, ALF, SALF and variable genes-GST (glutathione-S-transferase) fusion proteins are a convenient means of producing protein in a bacterial expression. Virtually any expression system may be employed in the expression of ALF, SALF and variable gene products. Eukaryotic expression systems, however, may also be used.
Transformation of host cells with DNA segments encoding ALF, SALF and variable genes also provides a convenient means for obtaining a protein for ALF, SALF and ALF or SALF including the variable portions. Complementary DNA (cDNA), genomic sequences and combinations thereof, are suitable for eukaryotic expression, as the host cell will, of course, process the genomic transcripts to yield functional mRNA for translation into protein.
Another embodiment is a method of preparing a protein composition comprising growing recombinant host cell comprising a vector that encodes a protein that includes an amino acid sequence in accordance with SEQ ID NOS.:2, 4 or 6, under conditions permitting nucleic acid expression and protein production followed by recovering the protein so produced. The host cell, conditions permitting nucleic acid expression, protein production and recovery, will be known to those of skill in the art, in light of the present disclosure of the ALF, SALF and variable region genes.