The estimated 50,000-100,000 genes scattered along the human chromosomes offer tremendous promise for the understanding, diagnosis; and treatment of human diseases. In addition, probes capable of specifically hybridizing to loci distributed throughout the human genome find applications in the construction of high resolution chromosome maps and in the identification of individuals.
In the past, the characterization of even a single human gene was a painstaking process, requiring years of effort. Recent developments in the areas of cloning vectors, DNA sequencing, and computer technology have merged to greatly accelerate the rate at which human genes can be isolated, sequenced, mapped, and characterized.
Currently, two different approaches are being pursued for identifying and characterizing the genes distributed along the human genome. In one approach, large fragments of genomic DNA are isolated, cloned, and sequenced. Potential open reading frames in these genomic sequences are identified using bioinformatics software. However, this approach entails sequencing large stretches of human DNA which do not encode proteins in order to find the protein encoding sequences scattered throughout the genome. In addition to requiring extensive sequencing, the bioinformatics software may mischaracterize the genomic sequences obtained, i.e., labeling non-coding DNA as coding DNA and vice versa.
An alternative approach takes a more direct route to identifying and characterizing human genes. In this approach, complementary DNAs (cDNAs) are synthesized from isolated messenger RNAs (mRNAs) which encode human proteins. Using this approach, sequencing is only performed on DNA which is derived from protein coding portions of the genome. Often, only short stretches of the cDNAs are sequenced to obtain sequences called expressed sequence tags (ESTs). The ESTs may then be used to isolate or purify extended cDNAs which include sequences adjacent to the EST sequences. The extended cDNAs may contain all of the sequence of the EST which was used to obtain them or only a portion of the sequence of the EST which was used to obtain them. In addition, the extended cDNAs may contain the full coding sequence of the gene from which the EST was derived or, alternatively, the extended cDNAs may include portions of the coding sequence of the gene from which the EST was derived. It will be appreciated that there may be several extended cDNAs which include the EST sequence as a result of alternate splicing or the activity of alternative promoters. Alternatively, ESTs having partially overlapping sequences may be identified and contigs comprising the consensus sequences of the overlapping ESTs may be identified.
In the past, these short EST sequences were often obtained from oligo-dT primed cDNA libraries. Accordingly, they mainly corresponded to the 3xe2x80x2 untranslated region of the mRNA. In part, the prevalence of EST sequences derived from the 3xe2x80x2 end of the mRNA is a result of the fact that typical techniques for obtaining cDNAs, are not well suited for isolating cDNA sequences derived from the 5xe2x80x2 ends of mRNAs (Adams et al., Nature 377:3-174, 1996, Hillier et al., Genome Res. 6:807-828, 1996), the entire disclosures of which are incorporated herein by reference.
In addition, in those reported instances where longer cDNA sequences have been obtained, the reported sequences typically correspond to coding sequences and do not include the full 5xe2x80x2 untranslated region (5xe2x80x2UTR) of the mRNA from which the cDNA is derived. Indeed, 5xe2x80x2UTRs have been shown to affect either the stability or translation of mRNAs. Thus, regulation of gene expression may be achieved through the use of alternative 5xe2x80x2UTRs as shown, for instance, for the translation of the tissue inhibitor of metalloprotease mRNA in mitogenically activated cells (Waterhouse et al, J. Biol Chem. 265:5585-9. 1990), the entire disclosure of which is incorporated herein by reference. Furthermore, modification of 5xe2x80x2UTR through mutation, insertion or translocation events may even be implied in pathogenesis. For instance, the fragile X syndrome, the most common cause of inherited mental retardation, is partly due to an insertion of multiple CGG trinucleotides in the 5xe2x80x2UTR of the fragile X mRNA resulting in the inhibition of protein synthesis via ribosome stalling (Feng et al, Science 268:7314, 1995), the entire disclosure of which is incorporated herein by reference. An aberrant mutation in regions of the 5xe2x80x2UTR known to inhibit translation of the proto-oncogene c-myc was shown to result in upregulation of c-myc protein levels in cells derived from patients with multiple myelomas (Willis et al, Curr Top Microbiol Immunol 224:269-76, 1997), the entire disclosure of which is incorporated herein by reference. In addition, the use of oligo-dT primed cDNA libraries does not allow the isolation of complete 5xe2x80x2UTRs since such incomplete sequences obtained by this process may not include the first exon of the mRNA, particularly in situations where the first exon is short. Furthermore, they may not include some exons, often short ones, which are located upstream of splicing sites. Thus, there is a need to obtain sequences derived from the 5xe2x80x2 ends of mRNAs.
While many sequences derived from human chromosomes have practical applications, approaches based on the identification and characterization of those chromosomal sequences which encode a protein product are particularly relevant to diagnostic and therapeutic uses. In some instances, the sequences used in such therapeutic or diagnostic techniques may be sequences which encode proteins which are secreted from the cell in which they are synthesized. Those sequences encoding secreted proteins as well as the secreted proteins themselves, are particularly valuable as potential therapeutic agents. Such proteins are often involved in cell to cell communication and may be responsible for producing a clinically relevant response in their target cells. In fact, several secretory proteins, including tissue plasminogen activator, G-CSF, GM-CSF, erythropoietin, human growth hormone, insulin, interferon-xcex1, interferon-xcex2, interferon-xcex3, and interleukin-2, are currently in clinical use. These proteins are used to treat a wide range of conditions, including acute myocardial infarction, acute ischemic stroke, anemia, diabetes, growth hormone deficiency, hepatitis, kidney carcinoma, chemotherapy-induced neutropenia and multiple sclerosis. For these reasons, extended cDNAs encoding secreted proteins or portions thereof represent a valuable source of therapeutic agents. Thus, there is a need for the identification and characterization of secreted proteins and the nucleic acids encoding them.
In addition to being therapeutically useful themselves, secretory proteins include short peptides, called signal peptides, at their amino termini which direct their secretion. These signal peptides are encoded by the signal sequences located at the 5xe2x80x2 ends of the coding sequences of genes encoding secreted proteins. These signal peptides can be used to direct the extracellular secretion of any protein to which they are operably linked. In addition, portions of the signal peptides called membrane-translocating sequences, may also be used to direct the intracellular import of a peptide or protein of interest. This may prove beneficial in gene therapy strategies in which it is desired to deliver a particular gene product to cells other than the cells in which it is produced. Signal sequences encoding signal peptides also find application in simplifying protein purification techniques. In such applications, the extracellular secretion of the desired protein greatly facilitates purification by reducing the number of undesired proteins from which the desired protein must be selected. Thus, there exists a need to identify and characterize the 5xe2x80x2 portions of the genes for secretory proteins which encode signal peptides.
Sequences coding for non-secreted proteins may also find application as therapeutics or diagnostics. In particular, such sequences may be used to determine whether an individual is likely to express a detectable phenotype, such as a disease, as a consequence of a mutation in the coding sequence of a protein. In instances where the individual is at risk of suffering from a disease or other undesirable phenotype as a result of a mutation in such a coding sequence, the undesirable phenotype may be corrected by introducing a normal coding sequence using gene therapy. Alternatively, if the undesirable phenotype results from overexpression of the protein encoded by the coding sequence, expression of the protein may be reduced using antisense or triple helix based strategies.
The secreted or non-secreted human polypeptides encoded by the coding sequences may also be used as therapeutics by administering them directly to an individual having a condition, such as a disease, resulting from a mutation in the sequence encoding the polypeptide. In such an instance, the condition can be cured or ameliorated by administering the polypeptide to the individual.
In addition, the secreted or non-secreted human polypeptides or portions thereof may be used to generate antibodies useful in determining the tissue type or species of origin of a biological sample. The antibodies may also be used to determine the cellular localization of the secreted or non-secreted human polypeptides or the cellular localization of polypeptides which have been fused to the human polypeptides. In addition, the antibodies may also be used in immunoaffinity chromatography techniques to isolate, purify, or enrich the human polypeptide or a target polypeptide which has been fused to the human polypeptide.
Public information on the number of human genes for which the promoters and upstream regulatory regions have been identified and characterized is quite limited. In part, this may be due to the difficulty of isolating such regulatory sequences. Upstream regulatory sequences such as transcription factor binding sites are typically too short to be utilized as probes for isolating promoters from human genomic libraries. Recently, some approaches have been developed to isolate human promoters. One of them consists of making a CpG island library (Cross et al., Nature Genetics 6: 236244, 1994), the entire disclosure of which is incorporated herein by reference. The second consists of isolating human genomic DNA sequences containing Spel binding sites by the use of Spel binding protein. (Mortlock et al., Genome Res. 6:327-335, 1996), the entire disclosure of which is incorporated herein by reference. Both of these approaches have their limits due to a lack of specificity and of comprehensiveness. Thus, there exists a need to identify and systematically characterize the 5xe2x80x2 portions of the genes.
The present 5xe2x80x2 ESTs may be used to efficiently identify and isolate 5xe2x80x2UTRs and upstream regulatory regions which control the location, developmental stage, rate, and quantity of protein synthesis, as well as the stability of the mRNA. Once identified and characterized, these regulatory regions may be utilized in gene therapy or protein purification schemes to obtain the desired amount and locations of protein synthesis or to inhibit, reduce, or prevent the synthesis of undesirable gene products.
In addition, ESTs containing the 5xe2x80x2 ends of protein genes may include sequences useful as probes for chromosome mapping and the identification of individuals. Thus, there is a need to identify and characterize the sequences upstream of the 5xe2x80x2 coding sequences of genes.
The present invention relates to purified, isolated, or enriched 5xe2x80x2 ESTs which include sequences derived from the authentic 5xe2x80x2 ends of their corresponding mRNAs. The term xe2x80x9ccorresponding mRNAxe2x80x9d refers to the mRNA which was the template for the cDNA synthesis which produced the 5xe2x80x2 EST. These sequences will be referred to hereinafter as xe2x80x9c5xe2x80x2 ESTs.xe2x80x9d The present invention also includes purified, isolated or enriched nucleic acids comprising contigs assembled by determining a consensus sequences from a plurality of ESTs containing overlapping sequences. These contigs will be referred to herein as xe2x80x9cconsensus contigated 5xe2x80x2ESTs.xe2x80x9d
As used herein, the term xe2x80x9cpurifiedxe2x80x9d does not require absolute purity; rather, it is intended as a relative definition. Individual 5xe2x80x2 EST clones isolated from a cDNA library have been conventionally purified to electrophoretic homogeneity. The sequences obtained from these clones could not be obtained directly either from the library or from total human DNA. The cDNA clones are not naturally occurring as such, but rather are obtained via manipulation of a partially purified naturally occurring substance (messenger RNA). The conversion of mRNA into a cDNA library involves the creation of a synthetic substance (cDNA) and pure individual cDNA clones can be isolated from the synthetic library by clonal selection. Thus, creating a cDNA library from messenger RNA and subsequently isolating individual clones from that library results in an approximately 104-106 fold purification of the native message. Purification of starting material or natural material to at least one order of magnitude, preferably two or three orders, and more preferably four or five orders of magnitude is expressly contemplated.
As used herein, the term xe2x80x9cisolatedxe2x80x9d requires that the material be removed from its original environment (e.g., the natural environment if it is naturally occurring). For example, a naturally-occurring polynucleotide present in a living animal is not isolated, but the same polynucleotide, separated from some or all of the coexisting materials in the natural system, is isolated.
As used herein, the term xe2x80x9crecombinantxe2x80x9d means that the 5xe2x80x2 EST is adjacent to xe2x80x9cbackbonexe2x80x9d nucleic acid to which it is not adjacent in its natural environment. Additionally, to be xe2x80x9cenrichedxe2x80x9d the 5xe2x80x2 ESTs will represent 5% or more of the number of nucleic acid inserts in a population of nucleic acid backbone molecules. Backbone molecules according to the present invention include nucleic acids such as expression vectors, self-replicating nucleic acids, viruses, integrating nucleic acids, and other vectors or nucleic acids used to maintain or manipulate a nucleic acid insert of interest. Preferably, the enriched 5xe2x80x2 ESTs represent 15% or more of the number of nucleic acid inserts in the population of recombinant backbone molecules. More preferably, the enriched 5xe2x80x2 ESTs represent 50% or more of the number of nucleic acid inserts in the population of recombinant backbone molecules. In a highly preferred embodiment, the enriched 5xe2x80x2 ESTs represent 90% or more of the number of nucleic acid inserts in the population of recombinant backbone molecules.
xe2x80x9cStringent,xe2x80x9d xe2x80x9cmoderate,xe2x80x9d and xe2x80x9clowxe2x80x9d hybridization conditions are as defined below.
The tenn xe2x80x9cpolypeptidexe2x80x9d refers to a polymer of amino acids without regard to the length of the polymer, thus, peptides, oligopeptides, and proteins are included within the definition of polypeptide. This term also does not specify or exclude post-expression modifications of polypeptides, for example, polypeptides which include the covalent attachment of glycosyl groups, acetyl groups, phosphate groups, lipid groups and the like are expressly encompassed by the term polypeptide. Also included within the definition are polypeptides which contain one or more analogs of an amino acid (including, for example, non-naturally occurring amino acids, amino acids which only occur naturally in an unrelated biological system, modified amino acids from mammalian systems etc.), polypeptides with substituted linkages, as well as other modifications known in the art, both naturally occurring and non-naturally occurring.
As used interchangeably herein, the terms xe2x80x9cnucleic acids,xe2x80x9d xe2x80x9coligonucleotides,xe2x80x9d and xe2x80x9cpolynucleotidesxe2x80x9d include RNA, DNA, or RNA/DNA hybrid sequences of more than one nucleotide in either single chain or duplex form. The term xe2x80x9cnucleotidexe2x80x9d as used herein as an adjective to describe molecules comprising RNA, DNA, or RNA/DNA hybrid sequences of any length in single-stranded or duplex form. The term xe2x80x9cnucleotidexe2x80x9d is also used herein as a noun to refer to individual nucleotides or varieties of nucleotides, meaning a molecule, or individual unit in a larger nucleic acid molecule, comprising a purine or pyrimidine, a ribose or deoxyribose sugar moiety, and a phosphate group, or phosphodiester linkage in the case of nucleotides within an oligonucleotide or polynucleotide. Although the term xe2x80x9cnucleotidexe2x80x9d is also used herein to encompass xe2x80x9cmodified nucleotidesxe2x80x9d which comprise at least one modifications (a) an alternative linking group, (b) an analogous form of purine, (c) an analogous form of pyrimidine, or (d) an analogous sugar, for examples of analogous linking groups, purine, pyrimidines, and sugars see for example PCT publication No. WO 95/04064. The polynucleotide sequences of the invention may be prepared by any known method, including synthetic, recombinant, ex vivo generation, or a combination thereof, as well as utilizing any purification methods known in the art.
The terms xe2x80x9cbase pairedxe2x80x9d and xe2x80x9cWatson and Crick base pairedxe2x80x9d are used interchangeably herein to refer to nucleotides which can be hydrogen bonded to one another be virtue of their sequence identities in a manner like that found in double-helical DNA with thymine or uracil residues linked to adenine residues by two hydrogen bonds and cytosine and guanine residues linked by three hydrogen bonds (See Stryer, L., Biochemistry, 4th edition, 1995).
The terms xe2x80x9ccomplementaryxe2x80x9d or xe2x80x9ccomplement thereofxe2x80x9d are used herein to refer to the sequences of polynucleotides which are capable of forming Watson and Crick base pairing with another specified polynucleotide throughout the entirety of the complementary region. For the purpose of the present invention, a first polynucleotide is deemed to be complementary to a second polynucleotide when each base in the first polynucleotide is paired with its complementary base. Complementary bases are, generally, A and T (or A and U), or C and G. xe2x80x9cComplementxe2x80x9d is used herein as a synonym from xe2x80x9ccomplementary polynucleotide,xe2x80x9d xe2x80x9ccomplementary nucleic acidxe2x80x9d and xe2x80x9ccomplementary nucleotide sequencexe2x80x9d . These terms are applied to pairs of polynucleotides based solely upon their sequences and not any particular set of conditions under which the two polynucleotides would actually bind. Preferably, a xe2x80x9ccomplementaryxe2x80x9d sequence is a sequence which an A at each position where there is a T on the opposite strand, a T at each position where there is an A on the opposite strand, a G at each position where there is a C on the opposite strand and a C at each position where there is a G on the opposite strand.
Thus, 5xe2x80x2 ESTs in cDNA libraries in which one or more 5xe2x80x2 ESTs make up 5% or more of the number of nucleic acid inserts in the backbone molecules are xe2x80x9cenriched recombinant 5xe2x80x2 ESTsxe2x80x9d as defined herein. Likewise, 5xe2x80x2 ESTs in a population of plasmids in which one or more 5xe2x80x2 ESTs of the present invention have been inserted such that they represent 5% or more of the number of inserts in the plasmid backbone are xe2x80x9cenriched recombinant 5xe2x80x2 ESTsxe2x80x9d as defined herein. However, 5xe2x80x2 ESTs in cDNA libraries in which 5xe2x80x2 ESTs constitute less than 5% of the number of nucleic acid inserts in the population of backbone molecules, such as libraries in which backbone molecules having a 5xe2x80x2 EST insert are extremely rare, are not xe2x80x9cenriched recombinant 5xe2x80x2 ESTs.xe2x80x9d
The term xe2x80x9ccapable of hybridizing to the polyA tail of said mRNAxe2x80x9d refers to and embraces all primers containing stretches of thymidine residues, so-called oligo(dT) primers, that hybridize to the 3xe2x80x2 end of eukaryotic poly(A)+ mRNAs to prime the synthesis of a first cDNA strand. Techniques for generating said oligo(dT) primers and hybridizing them to mRNA to subsequently prime the reverse transcription of said hybridized mRNA to generate a first cDNA strand are well known to those skilled in the art and are described in Current Protocols in Molecular Biology, John Wiley and Sons, Inc. 1997 and Sambrook et al., Molecular Cloning: A Laboratory Manual, Second Edition, Cold Spring Harbor Laboratory Press, 1989, the entire disclosures of which are incorporated herein by reference. Preferably, said oligo(dT) primers are present in a large excess in order to allow the hybridization of all mRNA 3xe2x80x2 ends to at least one oligo(dT) molecule. The priming and reverse transcription step are preferably performed between 37xc2x0 C. and 55xc2x0 C. depending on the type of reverse transcriptase used.
Preferred oligo(dT) primers for priming reverse transcription of mRNAs are oligonucleotides containing a stretch of thymidine residues of sufficient length to hybridize specifically to the polyA tail of mRNAs, preferably of 12 to 18 thymidine residues in length. More preferably, such oligo(T) primers comprise an additional sequence upstream of the poly(dT) stretch in order to allow the addition of a given sequence to the 5xe2x80x2end of all first cDNA strands which may then be used to facilitate subsequent manipulation of the cDNA. Preferably, this added sequence is 8 to 60 residues in length. For instance, the addition of a restriction site in 5xe2x80x2 of cDNAs facilitates subcloning of the obtained cDNA. Alternatively, such an added 5xe2x80x2 end may also be used to design primers of PCR to specifically amplify cDNA clones of interest.
In some embodiments, the present invention relates to 5xe2x80x2 ESTs which are derived from genes encoding secreted proteins. As used herein, a xe2x80x9csecretedxe2x80x9d protein is one which, when expressed in a suitable host cell, is transported across or through a membrane, including transport as a result of signal peptides in its amino acid sequence. xe2x80x9cSecretedxe2x80x9d proteins include without limitation proteins secreted wholly (e.g. soluble proteins), or partially (e.g. receptors) from the cell in which they are expressed. xe2x80x9cSecretedxe2x80x9d proteins also include without limitation proteins which are transported across the membrane of the endoplasmic reticulum.
Such 5xe2x80x2 ESTs include nucleic acid sequences, called signal sequences, which encode signal peptides which direct the extracellular secretion of the proteins encoded by the genes from which the 5xe2x80x2 ESTs are derived. Generally, the signal peptides are located at the amino termini of secreted proteins.
Secreted proteins are translated by ribosomes associated with the xe2x80x9croughxe2x80x9d endoplasmic reticulum. Generally, secreted proteins are co-translationally transferred to the membrane of the endoplasmic reticulum. Association of the ribosome with the endoplasmic reticulum during translation of secreted proteins is mediated by the signal peptide. The signal peptide is typically cleaved following its co-translational entry into the endoplasmic reticulum. After delivery to the endoplasmic reticulum, secreted proteins may proceed through the Golgi apparatus. In the Golgi apparatus, the proteins may undergo post-translational modification before entering secretory vesicles which transport them across the cell membrane.
The 5xe2x80x2 ESTs of the present invention have several important applications. For example, they may be used to obtain and express cDNA clones which include the full protein coding sequences of the corresponding gene products, including the authentic translation start sites derived from the 5xe2x80x2 ends of the coding sequences of the mRNAs from which the 5xe2x80x2 ESTs are derived. These cDNAs will be referred to hereinafter as xe2x80x9cfull-length cDNAs.xe2x80x9d These cDNAs may comprise a 3xe2x80x2 untranslated region and eventually a polyadenylation tail. These cDNAs may also include DNA derived from mRNA sequences upstream of the translation start site. The full-length cDNA sequences may be used to express the proteins corresponding to the 5xe2x80x2 ESTs. As discussed above, secreted proteins and non-secreted proteins may be therapeutically important. Thus, the proteins expressed from the cDNAs may be useful in treating and controlling a variety of human conditions. The 5xe2x80x2 ESTs may also be used to obtain the corresponding genomic DNA. The term xe2x80x9ccorresponding genomic DNAxe2x80x9d refers to the genomic DNA which encodes the mRNA from which the 5xe2x80x2 EST was derived.
Alternatively, the 5xe2x80x2 ESTs may be used to obtain and express extended cDNAs encoding portions of the protein. In the case of secreted proteins, the portions may comprise the signal peptides of the secreted proteins or the mature proteins generated when the signal peptide is cleaved off.
The present invention includes isolated, purified, or enriched xe2x80x9cEST-related nucleic acids.xe2x80x9dThe terms xe2x80x9cisolated,xe2x80x9d xe2x80x9cpurifiedxe2x80x9d or xe2x80x9cenrichedxe2x80x9d have the meanings provided above. As used herein, the term xe2x80x9cEST-related nucleic acidsxe2x80x9d means the nucleic acids of SEQ ID NOs. 24-811 and 1600-1622, extended,cDNAs obtainable using the nucleic acids of SEQ ID NOs. 24-811 and 1600-1622, full-length cDNAs obtainable using the nucleic acids of SEQ ID NOs. 24-811 and 1600-1622 or genomic DNAs obtainable using the nucleic acids of SEQ ID NOs. 24-811 and 1600-1622. The present invention also includes the sequences complementary to the EST-related nucleic acids.
The present invention also includes isolated, purified, or enriched xe2x80x9cfragments of EST-related nucleic acids.xe2x80x9d The terms xe2x80x9cisolated,xe2x80x9d xe2x80x9cpurifiedxe2x80x9d and xe2x80x9cenrichedxe2x80x9d have the meanings described above. As used herein the term xe2x80x9cfragments of EST-related nucleic acidsxe2x80x9d means fragments comprising at least 10, 12, 15, 18, 20, 23, 25, 28, 30, 35, 40, 50, 75, 100, 200, 300, 500, or 1000 consecutive nucleotides of the EST-related nucleic acids to the extent that fragments of these lengths are consistent with the lengths of the particular EST-related nucleic acids being referenced. In particular, fragments of EST-related nucleic acids refer to xe2x80x9cpolynucleotides described in Table II,xe2x80x9d xe2x80x9cpolynucleotides described in Table III,xe2x80x9d and xe2x80x9cpolynucleotides described in Table IV.xe2x80x9d The present invention also includes the sequences complementary to the fragments of the EST-related nucleic acids.
The present invention also includes isolated, purified, or enriched xe2x80x9cpositional segments of EST-related nucleic acids.xe2x80x9d As used herein, the term xe2x80x9cpositional segments of EST-related nucleic acidsxe2x80x9d includes segments comprising nucleotides 1-25, 26-50, 51-75, 76-100, 101-125, 126-150, 151-175, 176-200, 201-225, 226-250, 251-300, 301-325, 326-350, 351-375, 376-400, 401-425, 426-450, 451-475, 476-500, 501-525, 526-550, 551-575, 576-600 and 601-the terminal nucleotide of the EST-related nucleic acids to the extent that such nucleotide positions are consistent with the lengths of the particular EST-related nucleic acids being referenced. The term xe2x80x9cpositional segments of EST-related nucleic acids also includes segments comprising nucleotides 1-50, 51-100, 101-150, 151-200, 201-250, 251-300, 301-350, 351-400, 401-450, 450-500, 501-550, 551-600 or 601-the terminal nucleotide of the EST-related nucleic acids to the extent that such nucleotide positions are consistent with the lengths of the particular EST-related nucleic acids being referenced. The term xe2x80x9cpositional segments of EST-related nucleic acidsxe2x80x9d also includes segments comprising nucleotides 1-100, 101-200, 201-300, 301-400, 501-500, 500-600, or 601-the terminal nucleotide of the EST-related nucleic acids to the extent that such nucleotide positions are consistent with the lengths of the particular EST-related nucleic acids being referenced. In addition, the term xe2x80x9cpositional segments of EST-related nucleic acidsxe2x80x9d includes segments comprising nucleotides 1-200, 201-400, 400-600, or 601-the terminal nucleotide of the EST-related nucleic acids to the extent that such nucleotide positions are consistent with the lengths of the particular EST-related nucleic acids being referenced. The present invention also includes the sequences complementary to the positional segments of EST-related nucleic acids.
The present invention also includes isolated, purified, or enriched xe2x80x9cfragments of positional segments of EST-related nucleic acids.xe2x80x9d As used herein, the term xe2x80x9cfragments of positional segments of EST-related nucleic acidsxe2x80x9d refers to fragments comprising at least 10, 15, 18, 20, 23, 25, 28, 30, 35, 40, 50, 75, 100, 150, or 200 consecutive nucleotides of the positional segments of EST-related nucleic acids. The present invention also includes the sequences complementary to the fragments of positional segments of EST-related nucleic acids.
The present invention also includes isolated or purified xe2x80x9cEST-related polypeptides.xe2x80x9d As used herein, the term xe2x80x9cEST-related polypeptidesxe2x80x9d means the polypeptides encoded by the EST-related nucleic acids, including the polypeptides of SEQ D NOs. 812-1599.
The present invention also includes isolated or purified xe2x80x9cfragments of EST-related polypeptides.xe2x80x9d As used herein, the term xe2x80x9cfragments of EST-related polypeptidesxe2x80x9d means fragments comprising at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, or 150 consecutive amino acids of an EST-related polypeptide to the extent that fragments of these lengths are consistent with the lengths of the particular EST-related polypeptides being referenced. In particular, fragments of EST-related polypepides refer to polypeptides encoded by xe2x80x9cpolynucleotides described in Table II,xe2x80x9d xe2x80x9cpolynucleotides described in Table III,xe2x80x9d and xe2x80x9cpolynucleotides described in Table IV.xe2x80x9d
The present invention also includes isolated or purified xe2x80x9cpositional segments of EST-related polypeptides.xe2x80x9d As used herein, the term xe2x80x9cpositional segments of EST-related polypeptidesxe2x80x9d includes polypeptides comprising amino acid residues 1-25, 26-50, 51-75, 76-100, 101-125, 126-150, 151-175, 176-200, or 201-the C-terminal amino acid of the EST-related polypeptides to the extent that such amino acid residues are consistent with the lengths of the particular EST-related polypeptides being referenced. The term xe2x80x9cpositional segments of EST-related polypeptides also includes segments comprising amino acid residues 1-50, 51-100, 101-150, 151-200 or 201-the C-terminal amino acid of the EST-related polypeptides to the extent that such amino acid residues are consistent with the lengths of the particular EST-related polypeptides being referenced. The term xe2x80x9cpositional segments of EST-related polypeptidesxe2x80x9d also includes segments comprising amino acids 1-100 or 101-200 of the EST-related polypeptides to the extent that such amino acid residues are consistent with the lengths of particular EST-related polypeptides being referenced. In addition, the term xe2x80x9cpositional segments of EST-related polypeptidesxe2x80x9d includes segments comprising amino acid residues 1-200 or 201-the C-terminal amino acid of the EST-related polypeptides to the extent that amino acid residues are consistent with the lengths of the particular EST-related polypeptides being referenced.
The present invention also includes isolated or purified xe2x80x9cfragments of positional segments of EST-related polypeptides.xe2x80x9d As used herein, the term xe2x80x9cfragments of positional segments of EST-related polypeptidesxe2x80x9d means fragments comprising at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, or 150 consecutive amino acids of positional segments of EST-related polypeptides to the extent that fragments of these lengths are consistent with the lengths of the particular EST-related polypeptides being referenced.
The present invention also includes antibodies which specifically recognize the EST-related polypeptides, fragments of EST-related polypeptides, positional segments of EST-related polypeptides, or fragments of positional segments of EST-related polypeptides. In the case of secreted proteins, such as those of SEQ ID NOs. 1554-1580 antibodies which specifically recognize the mature protein generated when the signal peptide is cleaved may also be obtained as described below. Similarly, antibodies which specifically recognize the signal peptides of SEQ ID NOs. 812-1516 or 1554-1580 may also be obtained.
In some embodiments and in the case of secreted proteins, the EST-related nucleic acids, fragments of EST-related nucleic acids, positional segments of EST-related nucleic acids, or fragments of positional segments of nucleic acids include a signal sequence. In other embodiments, the EST-related nucleic acids, fragments of EST-related nucleic acids, positional segments of EST-related nucleic acids, or fragments of positional segments of nucleic acids may include the full coding sequence for the protein or, in the case of secreted proteins, the full coding sequence of the mature protein (i.e. the protein generated when the signal polypeptide is cleaved off). In addition, the EST-related nucleic acids, fragments of EST-related nucleic acids, positional segments of EST-related nucleic acids, or fragments of positional segments of nucleic acids may include regulatory regions upstream of the translation start site or downstream of the stop codon which control the amount, location, or developmental stage of gene expression.
As discussed above, both secreted and non-secreted human proteins may be therapeutically important. Thus, the proteins expressed from the EST-related nucleic acids, fragments of EST-related nucleic acids, positional segments of EST-related nucleic acids, or fragments of positional segments of nucleic acids may be useful in treating or controlling a variety of human conditions.
The EST-related nucleic acids, fragments of EST-related nucleic acids, positional segments of EST-related nucleic acids, or fragments of positional segments of nucleic acids may be used in forensic procedures to identify individuals or in diagnostic procedures to identify individuals having genetic diseases resulting from abnormal gene expression. In addition, the EST-related nucleic acids, fragments of EST-related nucleic acids, positional segments of EST-related nucleic acids, or fragments of positional segments of nucleic acids are useful for constructing a high resolution map of the human chromosomes.
The present invention also relates to secretion vectors capable of directing the secretion of a protein of interest. Such vectors may be used in gene therapy strategies in which it is desired to produce a gene product in one cell which is to be delivered to another location in the body. Secretion vectors may also facilitate the purification of desired proteins.
The present invention also relates to expression vectors capable of directing the expression of an inserted gene in a desired spatial or temporal manner or at a desired level. Such vectors may include sequences upstream of the EST-related nucleic acids, fragments of EST-related nucleic acids, positional segments of EST-related nucleic acids, or fragments of positional segments of nucleic acids, such as promoters or upstream regulatory sequences.
The present invention also comprises fusion vectors for making chimeric polypeptides comprising a first polypeptide and a second polypeptide. Such vectors are useful for determining the cellular localization of the chimeric polypeptides or for isolating, purifying or enriching the chimeric polypeptides.
The EST-related nucleic acids, fragments of EST-related nucleic acids, positional segments of EST-related nucleic acids, or fragments of positional segments of nucleic acids may also be used for gene therapy to control or treat genetic diseases. In the case of secreted proteins, signal peptides may be fused to heterologous proteins to direct their extracellular secretion.
Bacterial clones containing Bluescript plasmids having inserts containing the sequence of the non-aligned 5xe2x80x2 ESTs, also referred to as singletons, and sequences of the 5xe2x80x2 ESTs which were aligned to yield consensus contigated 5xe2x80x2 ESTs are presently stored at 80xc2x0 C. in 4% (v/v) glycerol in the inventor""s laboratories under internal designations. The non-aligned 5xe2x80x2 ESTs are those which comprise a single EST from a single tissue in the listing of Table V. The inserts may be recovered from the stored materials by growing the appropriate clones on a suitable medium. The Bluescript DNA can then be isolated using plasmid isolation procedures familiar to those skilled in the art such as alkaline lysis minipreps or large scale alkaline lysis plasmid isolation procedures. If desired the plasmid DNA may be further enriched by centrifugation on a cesium chloride gradient, size exclusion chromatography, or anion exchange chromatography. The plasmid DNA obtained using these procedures may then be manipulated using standard cloning techniques familiar to those skilled in the art. Alternatively, a PCR can be performed with primers designed at both ends of the inserted EST-related nucleic acids, fragments of EST-related nucleic acids, positional segments of EST-related nucleic acids, or fragments of positional segments of nucleic acids. The PCR product which corresponds to the EST-related nucleic acids, fragments of EST-related nucleic acids, positional segments of EST-related nucleic acids, or fragments of positional segments of nucleic acids can then be manipulated using standard cloning techniques familiar to those skilled in the art.
One embodiment of the present invention is a purified nucleic acid comprising, consisting essentially of, or consisting of a sequence selected from the group consisting of SEQ. ID NOs. 24-811 and SEQ ID NOs. 1600-1622 and sequences complementary to the sequences of SEQ ID NOs. 24-811 and SEQ ID NOs. 1600-1622.
Another embodiment of the present invention is a purified nucleic acid comprising, consisting essentially of, or consisting of at least 10, 12, 15, 18, 20, 23, 25, 28, 30, 35, 40, 50, 75, 100, 200, 300, 500, or 1000 consecutive nucleotides, to the extent that fragments of these lengths are consistent with the specific sequence, of a sequence selected from the group consisting of SEQ ID NOs. 24-811 and SEQ ID NOs. 1600-1622 and sequences complementary to the sequences of SEQ ID NOs. 24-811 and SEQ ID NOs. 1600-1622.
A further embodiment of the present invention is a purified nucleic acid comprising, consisting essentially of, or consisting of the coding sequence of a sequence selected from the group consisting of SEQ ID NOs. 24-811.
Yet another embodiment of the present invention is a purified nucleic acid comprising, consisting essentially of, or consisting of the full coding sequences of a sequence selected from the group consisting of SEQ ID NOs. 766-792 wherein the full coding sequence comprises the sequence encoding the signal peptide and the sequence encoding the mature protein.
Still another embodiment of the present invention is a purified nucleic acid comprising, consisting essentially of, or consisting of a contiguous span of a sequence selected from the group consisting of SEQ ID NOs. 766-792 which encodes the mature protein.
Another embodiment of the present invention is a purified nucleic acid comprising, consisting essentially of, or consisting of a contiguous span of a sequence selected from the group consisting of SEQ ID NOs. 24-728 and 766-792 which encodes the signal peptide.
Another embodiment of the present invention is a purified nucleic acid encoding a polypeptide comprising, consisting essentially of, or consisting of a sequence selected from the group consisting of the sequences of SEQ ID NOs. 812-1599.
Another embodiment of the present invention is a purified nucleic acid encoding a polypeptide comprising, consisting essentially of, or consisting of a sequence selected from the group consisting of the sequences of SEQ ID NOs. 1554-1580.
Another embodiment of the present invention is a purified nucleic acid encoding a polypeptide comprising, consisting essentially of, or consisting of a mature protein included in a sequence selected from the group consisting of the sequences of SEQ ID NOs. 1554-1580.
Another embodiment of the present invention is a purified nucleic acid encoding a polypeptide comprising, consisting essentially of, or consisting of a signal peptide included in a sequence selected from the group consisting of the sequences of SEQ ID NOs. 812-1516 and 1554-1580.
Another embodiment of the present invention is a purified nucleic acid encoding a polypeptide, wherein said nucleic acid comprises, consists essentially of, or consists of
a) a sequence selected from the group consisting of SEQ ID NOs. 24-811 and SEQ ID NOs. 1600-1622; and
b) a polyadenylation tail.
Another embodiment of the present invention is a purified nucleic acid encoding a polypeptide wherein said nucleic acid comprises, consists essentially of, or consists of
a) a sequence encoding a polypeptide selected from the group consisting of SEQ ID NOs. 812-1599; and
b) a polyadenylation tail.
Another embodiment of the present invention is a purified nucleic acid at least 20, 25, 30, 35, 40, 50, 75, 100, 200, 300, 500 or 1000 nucleotides in length which hybridizes under stringent conditions to a sequence selected from the group consisting of SEQ ID NOs. 24-811 and SEQ ID NOs. 1600-1622 and sequences complementary to the sequences of SEQ ID NOs. 24-811 and SEQ ID NOs. 1600-1622.
Another embodiment of the present invention is a purified or isolated polypeptide comprising, consisting essentially of, or consisting of a sequence selected from the group consisting of the sequences of SEQ ID NOs. 812-1599.
Another embodiment of the present invention is a purified or isolated polypeptide comprising, consisting essentially of, or consisting of a sequence selected from the group consisting of SEQ ID NOs. 1554-1580.
Another embodiment of the present invention is a purified or isolated polypeptide comprising, consisting essentially of, or consisting of a mature protein of a polypeptide selected from the group consisting of SEQ ID NOs. 15541580.
Another embodiment of the present invention is a purified or isolated polypeptide comprising, consisting essentially of, or consisting of a signal peptide of a sequence selected from the group consisting of the polypeptides of SEQ ID NOs. 812-1516 and 1554-1580.
Another embodiment of the present invention is a purified or isolated polypeptide comprising, consisting essentially of, or consisting of at least 12, 15, 18, 20, 23, 25, 28, 30, 35, 40, 50, 75, 100, 200, 300, 500, or 1000 consecutive amino acids, to the extent that fragments of these lengths are consistent with the specific sequence, of a sequence selected from the group consisting of the sequences of SEQ ID NOs. 812-1599.
Another embodiment of the present invention is a method of making a cDNA comprising the steps of contacting a collection of mRNA molecules from human cells with a primer. comprising at least 12, 15, 18, 20, 23, 25, 28, 30, 35, 40, or 50 consecutive nucleotides of a sequence selected from the group consisting of the sequences complementary to SEQ ID NOs. 24-811 and SEQ ID NOs. 1600-1622, hybridizing said primer to an mRNA in said collection that encodes said protein reverse transcribing said hybridized primer to make a first cDNA strand from said mRNA, making a second cDNA strand complementary to said first cDNA strand and isolating the resulting cDNA encoding said protein comprising said first cDNA strand and said second cDNA strand.
Another embodiment of the present invention is a purified cDNA obtainable by the method of the preceding paragraph.
In one aspect of this embodiment, the cDNA encodes at least a portion of a human polypeptide. Preferably, said human polypeptide comprises at least 8, 10, 12, 15, 18, 20, 23, 25, 28, 30, 35, 40, 50, 75, 100, 200, 300, 500, or 1000 consecutive amino acids, to the extent that fragments of these lengths are consistent with the specific sequence, of a sequence encoded by a sequence selected from the group consisting of the sequences of SEQ ID NOs. 24-811. More preferably, said human polypeptide comprises the polypeptide encoded by a sequence selected from the group consisting of the sequences of SEQ ID NOs. 24-811. In one aspect of this embodiment, said cDNA comprises the complete coding sequence of said human polypeptide.
Another embodiment of the present invention is a method of making a cDNA comprising the steps of contacting a cDNA collection with a detectable probe comprising at least 12, 15, 18, 20, 23, 25, 28, 30, 35, 40, or 50 consecutive nucleotides of a sequence selected from the group consisting of SEQ ID NOs. 24-811 and SEQ ID NOs. 1600-1622 and the sequences complementary to SEQ ID NOs. 24-811 and SEQ ID NOs. 1600-1622 under conditions which permit said probe to hybridize to a cDNA, identifying said cDNA which hybridizes to said detectable probe, and isolating said cDNA.
Another embodiment of the present invention is a purified cDNA obtainable by the method of the preceding paragraph.
In one aspect of this embodiment, the cDNA encodes at least a portion of a human polypeptide. Preferably, said human polypeptide comprises at least 8, 10, 12, 15, 18, 20, 23, 25, 28, 30, 35, 40, 50, 75, 100, 200, 300, 500, or 1000 consecutive amino acids, to the extent that fragments of these lengths are consistent with the specific sequence, of a sequence encoded by a sequence selected from the group consisting of the sequences of SEQ ID NOs. 24-811. More preferably, said human polypeptide comprises the polypeptide encoded by a sequence selected from the group consisting of the sequences of SEQ ID NOs. 24-811. In one aspect of this embodiment, said cDNA comprises the complete coding sequence of said human polypeptide.
Another embodiment of the present invention is a method of making a cDNA comprising the steps of contacting a collection of mRNA molecules from human cells with a first primer capable of hybridizing to the polyA tail of said mRNA, hybridizing said first primer to said polyA tail, reverse transcribing said mRNA to make a first cDNA strand, making a second cDNA strand complementary to said first cDNA strand using at least one primer comprising at least 12, 15, 18, 20, 23, 25, 28, 30, 35, 40, or 50 consecutive nucleotides of a sequence selected from the group consisting of SEQ ID NOs. 24-811 and SEQ ID NOs. 1600-1622, and isolating the resulting cDNA comprising said first cDNA strand and said second cDNA strand.
Another embodiment of the present invention is a purified cDNA obtainable by the method of the preceding paragraph.
In one aspect of this embodiment, said cDNA encodes at least a portion of a human polypeptide. Preferably, said human polypeptide comprises at least 8, 10, 12, 15, 18, 20, 23, 25, 28, 30, 35, 40, 50, 75, 100, 200, 300, 500, or 1000 consecutive amino acids, to the extent that fragments of these lengths are consistent with the specific sequence, of a sequence encoded by a sequence selected from the group consisting of the sequences of SEQ ID NOs. 24-811. More preferably, said human polypeptide comprises the polypeptide encoded by a sequence selected from the group consisting of the sequences of SEQ ID NOs. 24-811. In one aspect of this embodiment, said cDNA comprises the complete coding sequence of said human polypeptide.
In another aspect of the preceding method the second cDNA strand is made by contacting said first cDNA strand with a second primer comprising at least 12, 15, 18, 20, 23, 25, 28, 30, 35, 40, or 50 consecutive nucleotides of a sequence selected from the group consisting of SEQ ID NOs. 24-811 and SEQ ID NOs. 1600-1622 and a third primer which sequence is fully included within the sequence of said first primer, performing a first polymerase chain reaction with said second and third primers to generate a first PCR product, contacting said first PCR product with a fourth primer, said fourth primer comprising at least 12, 15, 18, 20, 23, 25, 28, 30, 35, 40, or 50 consecutive nucleotides of said sequence selected from the group consisting of SEQ ID NOs. 24-811 and SEQ ID NOs. 1600-1622, and a fifth primer which sequence is fully included within the sequence of said third primer, wherein said fourth and fifth primers hybridize to sequences within said first PCR product, and performing a second polymerase chain reaction, thereby generating a second PCR product.
One aspect of this embodiment is a purified cDNA obtainable by the method of the preceding paragraph.
In another aspect of this embodiment, said cDNA encodes at least a portion of a human polypeptide. Preferably, said human polypeptide comprises at least 8, 10, 12, 15, 18, 20, 23, 25, 28, 30, 35, 40, 50, 75, 100, 200, 300, 500, or 1000 consecutive amino acids, to the extent that fragments of these lengths are consistent with the specific sequence, of a sequence encoded by a sequence selected from the group consisting of the sequences of SEQ ID NOs. 24-811. More preferably, said human polypeptide comprises the polypeptide encoded by a sequence selected from the group consisting of the sequences of SEQ ID NOs. 24-811. In one aspect of this embodiment, said cDNA comprises the complete coding sequence of said human polypeptide.
Alternatively, the second cDNA strand may be made by contacting said first cDNA strand with a second primer comprising at least 12, 15, 18, 20, 23, 25, 28, 30, 35, 40, or 50 consecutive nucleotides of a sequence selected from the group consisting of SEQ ID NOs. 24-811 and SEQ ID NOs. 1600-1622, hybridizing said second primer to said first strand cDNA, and extending said hybridized second primer to generate said second cDNA strand.
One aspect of the above embodiment is a purified cDNA obtainable by the method of the preceding paragraph.
In a further aspect of this embodiment said cDNA encodes at least a portion of a human polypeptide. Preferably, said human polypeptide comprises at least 8, 10, 12, 15, 18, 20, 23, 25, 28, 30, 35, 40, 50, 75, 100, 200, 300, 500, or 1000 consecutive amino acids, to the extent that fragments of these lengths are consistent with the specific sequence, of a sequence encoded by a sequence selected from the group consisting of the sequences of SEQ ID NOs. 24-811. More preferably, said human polypeptide comprises the polypeptide encoded by a sequence selected from the group consisting of the sequences of SEQ ID NOs. 24-811. In one aspect of this embodiment, said cDNA comprises the complete coding sequence of said human polypeptide.
Another embodiment of the present invention is a method of making a polypeptide comprising the steps of obtaining a cDNA which encodes a polypeptide encoded by a nucleic acid comprising, consisting essentially of, or consisting of a sequence selected from the group consisting of SEQ ID NOs. 24-811 or a cDNA which encodes a polypeptide comprising at least 6, 8, 10, 12, 15, 18, 20, 23, 25, 28, 30, 35, 40, or 50 consecutive amino acids of a polypeptide encoded by a sequence selected from the group consisting of SEQ ID NOs. 24-811, inserting said cDNA in an expression vector such that said cDNA is operably linked to a promoter, introducing said expression vector into a host cell whereby said host cell produces the protein encoded by said cDNA, and isolating said protein.
Another aspect of this embodiment is an isolated protein obtainable by the method of the preceding paragraph.
Another embodiment of the present invention is a method of obtaining a promoter DNA comprising the steps of obtaining genomic DNA located upstream of a nucleic acid comprising a sequence selected from the group consisting of SEQ ID NOs. 24-811 and SEQ ID NOs. 1600-1622 and the sequences complementary to the sequences of SEQ ID NOs. 24-811 and SEQ ID NOs. 1600-1622, screening said genomic DNA to identify a promoter capable of directing transcription initiation, and isolating said DNA comprising said identified promoter.
In one aspect of this embodiment, said obtaining step comprises walking from genomic DNA comprising a sequence selected from the group consisting of SEQ ID NOs. 24-811 and SEQ ID NOs. 1600-1622 and the sequences complementary to SEQ ID NOs. 24-811 and SEQ ID NOs. 1600-1622. In another aspect of this embodiment, said screening step comprises inserting genomic DNA located upstream of a sequence selected from the group consisting of SEQ ID NOs. 24-811 and SEQ ID NOs. 1600-1622 and the sequences complementary to SEQ ID NOs. 24-811 and SEQ ID NOs. 1600-1622 into a promoter reporter vector. For example, said screening step may comprise identifying motifs in genomic DNA located upstream of a sequence selected from the group consisting of SEQ ID NOs. 24-811 and SEQ ID NOs. 1600-1622 and the sequences complementary to SEQ ID NOs. 24-811 and SEQ ID NOs. 1600-1622 which are transcription factor binding sites or transcription start sites.
Another embodiment of the present invention is a isolated promoter obtainable by the method of the paragraph above.
Another embodiment of the present invention is an array of discrete ESTs or fragments thereof of at least 12, 15, 18, 20, 23, 25, 28, 30, 35, 40, 50, or 100 nucleotides in length, said array comprising at least one sequence selected from the group consisting of SEQ ID NOs. 24-811 and SEQ ID NOs. 1600-1622, the sequences complementary to the sequences of SEQ ID NOs. 24-811 and SEQ ID NOs. 1600-1622 and fragments comprising at least 12, 15, 18,20, 23, 25, 28,30, 35, 40, 50, or 100 consecutive nucleotides of a sequence selected from the group consisting of SEQ ID NOs. 24-811 and SEQ ID NOs. 1600-1622 and the sequences complementary to SEQ ID NOs. 24-811 and SEQ ID NOs. 1600-1622. In some aspects of this embodiment, the array includes at least two sequences selected from the group consisting of SEQ ID NOs. 24-811 and SEQ ID NOs. 1600-1622, the sequences complementary to the sequences of SEQ ID NOs. 24-811 and SEQ ID NOs. 1600-1622, and fragments comprising at least 12, 15, 18, 20, 23, 25, 28, 30, 35, 40, 50, or 100 consecutive nucleotides of a sequence selected from the group consisting of SEQ ID NOs. 24-811 and SEQ ID NOs. 1600-1622 and the sequences complementary to SEQ ID NOs. 24-811 and SEQ ID NOs. 1600-1622. In another aspect of this embodiment., the array includes at least one, three, five, ten, fifteen, or twenty sequences selected from the group consisting of SEQ ID NOs. 24-811 and SEQ ID NOs. 1600-1622, the sequences complementary to the sequences of SEQ ID NOs. 24-811 and SEQ ID NOs. 1600-1622 and fragments comprising at least 12, 15, 18, 20, 23, 25, 28, 30, 35, 40, 50, or 100 consecutive nucleotides of a sequence selected from the group consisting of SEQ ID NOs. 24-811 and SEQ ID NOs. 1600-1622 and the sequences complementary to SEQ ID NOs. 24-811 and SEQ ID NOs. 1600-1622.
Another embodiment of the present invention is an enriched population of recombinant nucleic acids, said recombinant nucleic acids comprising an insert nucleic acid and a backbone nucleic acid, wherein at least 0.01%, 0.05%, 0.1%, 0.5%, 1%, 2%, 5%, 10%, or 20% of said insert nucleic acids in said population comprise a sequence selected from the group consisting of SEQ ID NOs. 24-811 and SEQ ID NOs. 1600-1622, the sequences complementary to SEQ ID NOs. 24-811 and SEQ ID NOs. 1600-1622, and fragments comprising at least 12, 15, 18, 20, 23, 25, 28, 30, 35, 40, 50, or 100 consecutive nucleotides of a sequence selected from the group consisting of SEQ ID NOs. 24-811 and SEQ ID NOs. 1600-1622 and the sequences complementary to SEQ ID NOs. 24-811 and SEQ ID NOs. 1600-1622.
Another embodiment of the present invention is a purified or isolated antibody capable of specifically binding to a polypeptide comprising a sequence selected from the group consisting of SEQ ID NOs. 812-1599.
Another embodiment of the present invention is a purified or isolated antibody capable of specifically binding to a polypeptide comprising at least 6, 8, 10, 12, 15, 18, 20, 23, 25, 28, 30, 35, 40, or 50 consecutive amino acids of a sequence selected from the group consisting of SEQ ID NOs. 812-1599.
Yet, another embodiment of the present invention is an antibody composition capable of selectively binding to an epitope-containing fragment of a polypeptide comprising a contiguous span of at least 8, 10, 12, 15, 18, 20, 23, 25, 28, 30, 35, 40, or 50 amino acids of any of SEQ ID NOs. 812-1599, wherein said antibody is polyclonal or monoclonal.
Another embodiment of the present invention is a computer readable medium having stored thereon a sequence selected from the group consisting of a nucleic acid code of SEQ ID NOs. 24-811 and 1600-1622 and a polypeptide code of SEQ ID NOs. 812-1599.
Another embodiment of the present invention is a computer system comprising a processor and a data storage device wherein said data storage device has stored thereon a sequence selected from the group consisting of a nucleic acid code of SEQ ID NOs. 24-811 and 1600-1622 and a polypeptide code of SEQ ID NOs. 812-1599. In one aspect of this embodiment the computer system further comprises a sequence comparer and a data storage device having reference sequences stored thereon. For example, the sequence comparer may comprise a computer program which indicates polymorphisms. In another aspect of this embodiment, the computer system further comprises an identifier which identifies features in said sequence.
Another embodiment of the present invention is a method for comparing a first sequence to a reference sequence wherein said first sequence is selected from the group consisting of a nucleic acid code of SEQID NOs. 24-811 and 1600-1622 and a polypeptide code of SEQ ID NOs. 812-1599 comprising the steps of reading said first sequence and said reference sequence through use of a computer program which compares sequences and determining differences between said first sequence and said reference sequence with said computer program. In some aspects of this embodiment, said step of determining differences between the first sequence and the reference sequence comprises identifying polymorphisms.
Another embodiment of the present invention is a method for identifying a feature in a sequence selected from the group consisting of a nucleic acid code of SEQ ID NOs. 24-811 and 1600-1622 and a polypeptide code of SEQ ID NOs. 812-1599 comprising the steps of reading said sequence through the use of a computer program which identifies features in sequences and identifying features in said sequence with said computer program.
Another embodiment of the present invention is a vector comprising a nucleic acid according to any one of the nucleic acids described above.
In one aspect of this embodiment, the vector encodes a fusion protein comprising a signal peptide encoded by a sequence selected from the group consisting of the sequences of SEQ ID NOs. 24-811 and 1600-1622 operably linked to a second nucleic acid encoding an heterologous polypeptide.
Another embodiment of the present invention is a host cell containing any of the above vectors.
Another embodiment of the present invention is a method for directing the secretion of a polypeptide comprising the steps of culturing a host cell containing a vector encoding a fusion protein, said fusion protein comprises a signal peptide encoded by a sequence selected from the group consisting of the sequences of SEQ ID NOs. 24-811 and 1600-1622 operably linked to a second nucleic acid encoding an heterologous polypeptide, under conditions which allow the secretion of said fusion protein and recovering said fusion protein. In one aspect of this embodiment, said fusion protein is secreted into the extracellular environment. In another aspect of this embodiment, said fusion protein is inserted into the membrane of said host cell
Another embodiment of the present invention is a method for importing a polypeptide into a cell comprising the step of contacting said cell with a fusion protein comprising a signal peptide encoded by a sequence selected from the group consisting of the sequences of SEQ ID NOs: 38-270, operably linked to said polypeptide.
Another embodiment of the present invention is a method of making any of the nucleic acids described above comprising the steps of introducing said nucleic acid into a host cell such that said nucleic acid is present in multiple copies in each host cell and isolating said nucleic acid from said host cell.
Another embodiment of the present invention is a method of making a nucleic acid of any of the nucleic acids described above comprising the step of sequentially linking together the nucleotides in said nucleic acids.
Another embodiment of the present invention is a method of making any of the polypeptides described above wherein said polypeptides is 150 amino acids in length or less comprising the step of sequentially linking together the amino acids in said polypeptide.
Another embodiment of the present invention is a method of making any of the polypeptides described above wherein said polypeptides is 120 amino acids in length or less comprising the step of sequentially linking together the amino acids in said polypeptides.