The invention relates to human nucleic acid sequences from endometrial tumors, which code for gene products or portions thereof, their functional genes that code at least one bioactive polypeptide and their use.
In addition, the invention relates to the polypeptides that can be obtained by way of the sequences and their use.
One of the main causes of death by cancer in women is the endometrial tumor, for control of which new therapies are necessary. Previously used therapies, such as, e.g., chemotherapy, hormone therapy or surgical removal of tumor tissue, frequently do not result in a complete cure.
The cancer phenomenon often goes along with overexpression or underexpression of certain genes in degenerated cells, it still being unclear whether these altered expression rates are the cause or the result of the malignant transformation. Identification of these genes would be an important step for development of new therapies against cancer. Spontaneous formation of cancer is often preceded by a host of mutations. They can have the most varied effects on the expression pattern in the affected tissue, such as, e.g., underexpression or overexpression, but also expression of shortened genes. Several such changes due to these mutation cascades can ultimately lead to malignant degeneration. The complexity of these relationships makes an experimental approach very difficult.
A database that consists of so-called ESTs is used to look for candidate genes, i.e., genes that compared to the tumor tissue are more strongly expressed in normal tissue. ESTs (expressed sequence tags) are sequences of cDNAs, i.e., mRNAs transcribed in reverse, therefore molecules that reflect gene expression. The EST sequences are determined for normal and degenerated tissue. These databases are offered to some extent commercially by various companies. The ESTs of the LifeSeq database, which is used here, are generally between 150 and 350 nucleotides long. They represent a pattern that is unmistakable for a certain gene, although this gene is normally very much longer ( greater than 2000 nucleotides). By comparison of the expression patterns of normal and tumor tissue, ESTs can be identified that are important for tumor formation and proliferation. There is, however, the following problem: Since the EST sequences that are found can belong to different regions of an unknown gene due to different constructions of cDNA libraries, in this case a completely incorrect ratio of the occurrence of these ESTs in the respective tissue would arise. This would only be noticed when the complete gene is known and thus ESTs can be assigned to the same gene.
It has now been found that this possibility of error can be reduced if all ESTs from the respective tissue type are assembled beforehand, before the expression patterns are compared to one another. Overlapping ESTs of the same gene were thus combined into longer sequences (see FIG. 1, FIG. 2a and FIG. 3). This lengthening and thus coverage of an essentially larger gene region in each of the respective bases are intended to largely avoid the above-described error. Since there were no existing software products for this purpose, programs for assembling genomic sections were employed, which were used modified and to which our own programs were added. A flow chart of the assembly procedure is shown in FIGS. 2b1-2b4.
Nucleic acid sequences Seq. ID No. 1 to Seq. ID No. 141 and Seq. ID Nos. 531-552, 554, and 555, which play a role as candidate genes in endometrial tumors, have now been found.
Nucleic acid sequences Seq. ID Nos. 1-126 and Seq. ID Nos. 531-552, 554, and 555 are of special interest.
The invention thus relates to nucleic acid sequences that code a gene product or a portion thereof, comprising
a) a nucleic acid sequence selected from the group of nucleic acid sequences Seq. ID Nos. 1-126 and Seq. ID Nos. 531-552, 554, and 555,
b) an allelic variation of the nucleic acid sequences named under a) or
c) a nucleic acid sequence that is complementary to the nucleic acid sequences named under a) or b).
In addition, the invention relates to a nucleic acid sequence according to one of the sequences Seq. ID Nos. 1-126 or a complementary or allelic variant thereof and the nucleic acid sequences thereof, which have 90% to 95% homology to a human nucleic acid sequence.
The invention also relates to nucleic acid sequences Seq. ID No. 1 to Seq. ID No. 141 and Seq. ID Nos. 531-552, 554, and 555, which are expressed elevated in the endometrial tumor.
The invention further relates to nucleic acid sequences comprising a portion of the above-mentioned nucleic acid sequences in such a sufficient amount that they hybridize with sequences Seq. ID Nos. 1-126 and Seq. ID Nos. 531-552, 554, and 555.
The nucleic acid sequences according to the invention generally have a length of at least 50 to 4500 bp, preferably a length of at least 150 to 4000 bp, especially a length of 450 to 3500 bp.
With the partial sequences Seq. ID Nos. 1-126 and Seq. ID Nos. 531-552, 554, and 555 according to the invention, expression cassettes can also be built using current process practice, whereby on the cassette at least one of the nucleic acid sequences according to the invention is combined with at least one control or regulatory sequence generally known to one skilled in the art, such as, e.g., a suitable promoter. The sequences according to the invention can be inserted in a sense or antisense orientation.
A large number of expression cassettes or vectors and promoters which can be used are known in the literature.
Expression cassettes or vectors are defined as: 1. bacterial, such as, e.g., phagescript, pBs, xcfx86X174, pBluescript SK, pBs KS, pNH8a, pNH16a, pNH18a, pNH46a (Stratagene), pTrc99A, pKK223-3, pKK233-3, pDR540, pRIT5 (Pharmacia), 2. eukaryotic, such as, e.g., pWLneo, pSV2cat, pOG44, pXT1, pSG (Stratagene), pSVK3, pBPV, pMSG, pSVL (Pharmacia).
Control or regulatory sequences are defined as suitable promoters. Here, two preferred vectors are the pKK232-8 and the PCM7 vector. In particular, the following promoters are intended: lacI, lacZ, T3, T7, gpt, lambda PR, trc, CMV, HSV thymidine-kinase, SV40, LTRs from retrovirus and mouse metallothionein-I.
The DNA sequences located on the expression cassette can code a fusion protein which comprises a known protein and a bioactive polypeptide fragment.
The expression cassettes are likewise the subject matter of this invention.
The nucleic acid fragments according to the invention can be used to produce full-length genes. The genes that can be obtained are likewise the subject matter of this invention.
The invention also relates to the use of the nucleic acid sequences according to the invention and the gene fragments that can be obtained from use.
The nucleic acid sequences according to the invention can be moved with suitable vectors into host cells, in which as the heterologous part, the genetic information which is contained on the nucleic acid fragments and which is expressed is located.
The host cells containing the nucleic acid fragments are likewise the subject matter of this invention.
Suitable host cells are, e.g., prokaryotic cell systems such as E. coli or eukaryotic cell systems such as animal or human cells or yeasts.
The nucleic acid sequences according to the invention can be used in the sense or antisense form.
Production of polypeptides or their fragments is done by cultivation of the host cells according to current cultivation methods and subsequent isolation and purification of the peptides or fragments, likewise using current methods. The invention further relates to nucleic acid sequences, which code at least a partial sequence of a bioactive polypeptide.
This invention further relates to polypeptide partial sequences, so-called ORF (open-reading-frame)-peptides, according to the sequence protocols Seq. ID Nos. 142-528 and Seq. ID Nos. Seq. 561-575, 577-625, and 630-635.
The invention further relates to the polypeptide sequences that have at least 80% homology, especially 90% homology to the polypeptide partial sequences of ORF ID Nos. 142-528 and Seq. ID Nos. ORF 561-575, 577-625, and 630-635 according to the invention.
The invention also relates to antibodies that are directed against a polypeptide or a fragment thereof and that are coded by the nucleic acids of sequences Seq. ID No. 1 to Seq. ID No. 141 and Seq. ID Nos. 531-552, 554, and 555 according to the invention.
Antibodies are defined especially as monoclonal antibodies.
The antibodies according to the invention can be identified by, i.a., a phage display process. These antibodies are also the subject matter of the invention.
The polypeptide partial sequences according to the invention can be used in a phage display process. The polypeptides that are identified with this process and that bind to the polypeptide partial sequences according to the invention are also the subject matter of the invention.
The nucleic acid sequences according to the invention can also be used in a phage display process.
The polypeptides of sequences Seq. ID Nos. 142-528 and Seq. ID Nos. Seq. 561-575, 577-625, and 630-635 according to the invention can also be used as tools for finding active ingredients against endometrial tumors, which is likewise the subject matter of this invention.
Likewise the subject matter of this invention is the use of nucleic acid sequences according to sequences Seq. ID No. 1 to Seq. ID No. 141 and Seq. ID Nos. 531-552, 554, and 555 for expression of polypeptides, which can be used as tools for finding active ingredients against endometrial tumors.
The invention also relates to the use of the found polypeptide partial sequences Seq. ID Nos. 142-528 and Seq. ID Nos. 561-575, 577-625, and 630-635 as pharmaceutical agents in the gene therapy for treatment of uterus tumors or for the production of a pharmaceutical agent for treatment of uterus tumors.
The invention also relates to pharmaceutical agents that contain at least one polypeptide partial sequence Seq. ID Nos. 142-528 and Seq. ID Nos. Seq. 561-575, 577-625, and 630-635.
The nucleic acid sequences found according to the invention can also be genomic or mRNA sequences.
The invention also relates to genomic genes, their exon and intron structures and their splice variants that can be obtained from cDNAs of sequences Seq. ID No. 1 to Seq. ID No. 141 and Seq. ID Nos. 531-552, 554, and 555, and their use together with suitable regulatory elements, such as suitable promoters and/or enhancers.
With the nucleic acids according to the invention (cDNA sequences) Seq. ID Nos. 1-141 and Seq. ID Nos. 531-552, 554, and 555, genomic BAC, PAC and cosmid libraries are screened, and specifically human clones are isolated via complementary base pairing (hybridization). The BAC, PAC and cosmid clones isolated in this way are hybridized using fluorescence-in-situ hybridization on metaphase chromosomes, and the corresponding chromosome sections on which the corresponding genomic genes lie are identified. BAC, PAC and cosmid clones are sequenced in order to clarify the corresponding genomic genes in their complete structure (promoters, enhancers, silencers, exons and introns). BAC, PAC and cosmid clones can be used as independent molecules for gene transfer (see FIG. 5).
The invention also relates to BAC, PAC and cosmid clones containing functional genes and their chromosomal localization according to sequences Seq. ID No. 1 to Seq. ID No. 141 and Seq. ID Nos. 531-552, 554, and 555, for use as vehicles for gene transfer.
Meanings of Technical Terms and Abbreviations
Explanation of the Alignment Parameters
minimal initial match=minimal initial identity area
maximum pads per read=maximum number of insertions
maximum percent mismatch=maximum deviation in %