Genes are the basic unit of hereditary information contained within an organism and as such numerous mechanisms exists to ensure the stability of their sequence over many generations. These stabilizing mechanisms ensure that the sequence of nucleotides that comprises a gene is maintained in a constant form over time, and since this nucleotide sequence encodes the regulatory sequences that determines gene expression and the coding sequences that determine protein structure and function these genetic properties are also stably maintained. It follows that distinct genes can be distinguished based upon these differences in their sequence. The gene sequence determines the expression pattern and function of the gene and its protein product, so genes with distinct sequence will be expected to display distinct biological functions. The proteins that are translated from these gene sequences serve as the foundation for the construction of more complex structures and catalytic machinery that is the basis for the metabolic, developmental and physiological processes that make complex organisms and life possible. The eventual location of these structures and their corresponding functional activities are also determined by the underlying gene sequence that codes for their formation. Recent assessments of the human genome estimate that the human genome contains approximately 140,000 genes. Whereas each of these sequences has varying roles, many of these genes can be grouped into classes based on the function of the protein for which they encode. In turn, if a gene sequence is translated into a protein found to be involved in an important physiological pathway, it may be particularly relevant for both diagnostic and therapeutic uses. The identification and functional characterization of all of the genetic determinants of life is among the challenges of modern biology and medicine.
The examination for nucleotide identity establishes which trapped genes correspond to genes that are novel and the protein coding analysis identifies protein sequence motifs and domains of functional importance that the novel gene products have in common with proteins of known function. Examination of these protein similarities with known proteins implies aspects of the function of the novel genes and provides experimental approaches to test and elaborate upon knowledge of the function of these novel genes. One of the ways to establish gene function for novel genes is to use genetic manipulations to disrupt the normal expression of the novel gene in cells or experimental organisms. Observation of the phenotypic consequences of such genetic manipulations reveals the biological role and utility of a novel gene, even for those that have no identifiable sequence relationship to proteins of known function.
Identification of the timing and patterns of expression for the novel transcript and the proteins they encode also reveals biological function. Transcript expression can be mapped by RT-PCR analysis of cDNA from a variety of tissue and species sources, or by in situ hybridization to tissue samples, cells, or intact model organisms (e.g. mice, zebrafish, Drosophial or Caenorhabditis; Hughes S C, Krause H M, “Double labeling with fluorescence in situ hybridization in Drosophila whole-mount embryos”, Biotechniques 1998 April; 24(4):530-2; O'Neill J W, Bier E, “Double-label in situ hybridization using biotin and digoxigenin-tagged RNA probes”, Biotechniques 1994 November; 17(5):870, 874-5; Coutinho L L, Morris J, Ivarie R, “Whole mount in situ detection of low abundance transcripts of the myogenic factor qmfl and myosin heavy chain protein in quail embryos”, Biotechniques 1992 November; 13(5):722-4; Conlon R A, Herrmann B G, “Detection of messenger RNA by in situ hybridization to postimplantation embryo whole mounts”, Methods Enzymol 1993; 225:373-83). The expression pattern and subcellular distribution of the novel proteins can be assayed with immunohistochemical approaches using tagged or labeled affinity reagents (e.g. antibodies or phage displayed peptides with affinity for the novel proteins and detectable tags or labels) to localize protein expression within cells and cell supernatents, tissue samples, whole model organisms or protein extracts (Paddock S W, Langeland J A, DeVries P J, Carroll SB, “Three-color immunofluorescence imaging of Drosophila embryos by laser scanning confocal Microscopy”, Biotechniques 1993 January; 14(1):42-8).
Evidence for gene function can be obtained from the bioinformatic identification of sequence relationships between novel and known proteins, and can take advantage of the expression information to refine these implications. Molecular and classical genetics and biochemical assays are then used to establish gene function. Genetic alterations in the level or sequence of the expressed protein can be introduced into model organisms or cells in culture (U.S. patent application Ser. No. 09/276,820; Mortensen R M, “Double knockouts. Production of mutant cell lines in cardiovascular research”, Hypertension 1993 October; 22(4):646-51). Detailed characterization of the phenotype that results from this altered expression furthershows how the novel protein functions. The phenotypes that can result include transcriptional changes (monitored by in situ hybridization or DNA array analysis; Eisen M B, Brown P O, “DNA arrays for analysis of gene expression”, Methods Enzymol 1999; 303: 179-205; Drmanac R, Drmanac S, “cDNA screening by array hybridization”, Methods Enzymol 1999; 303:165-78), alterations in protein expression or changes in cell or organismal organization, environmental responses, or viability. Proteins that interact with the novel protein to participate in the implementation of its biological function can be identified by two hybrid screens for interactions among intracellular protein domains (Niethammer M, Sheng M, “Identification of ion channel-associated proteins using the yeast two-hybrid system”, Methods Enzymol 1998; 293:104-22), and by expression cloning (Nelson N, Liu Q R, “Cloning of genes or cDNAs encoding neurotransmitter transporters and their localization by immunocytochemistry”, Methods Enzymol 1998; 296:52-64; Romero M F, Kanai Y, Gunshin H, Hediger M A, “Expression cloning using Xenopus laevis oocytes”, Methods Enzymol 1998; 296:17-52; Blackwood E M, Eisenman R N, “Identification of protein-protein interactions by la˜nbda gtl 1 expression cloning”, Methods Enzymol 1995; 254:229-40; Miki T, Aaronson S A, “Isolation of oncogenes by expression cDNA cloning”, Methods Enzymol 1995; 254:196-206; Sparks A B, Adey N B, Quilliam L A, Thorn J M, Kay B K, “Screening phage-displayed random peptide libraries for SH3 ligands”, Methods Enzymol 1995; 255:498-509; Margolis B, Skolaik E Y, Schlessinger J, “Use of tyrosine-phosphorylated proteins to screen bacterial expression libraries for SH2 domains”, Methods Enzymol 1995; 255:360-9; Singh H, “Specific recognition site probes for isolating genes encoding I:)NA-binding proteins”, Methods Enzy˜nol 1993; 218:551-67) or modifier genetics approaches for intracellular as well as receptor and secreted novel proteins (Dove; W F, Shedlovsky; A, “METHOD FOR IDENTIFYING MUTANTS AND MOLECULES”, United States U.S. Pat. No. 5,780,236, Jul. 14, 1998). The expression cloning approach can be used whenever a molecular interaction, activity or phenotype can be used to screen for protein expression, and genetic screens are applicable whenever modulation of gene activity detectably affects the biology or biochemistry of a cell or organism. A related approach to the identification of gene function is to test the ability of libraries of novel genes to transcomplement the phenotypic effects of mutations in human cells or in those of a heterologous species (e.g. Norbury C, Moreno S, “Cloning cell cycle regulatory genes by transcomplementation in yeast”, Methods Enzymol 1997; 283:44s9).
In some instances, coding information present in the full length transcript for a particular gene may not be included in the sequence of the initial complementary DNA clone that was obtained. In these cases oligonucleotide primers are designed to enable the implementation of 5′-RACE to recover the missing exons (Matz M, Shagin D, Bogdanova E, Britanova O, Lukyanov S, Diatchenko L, Chenchik A, “Amplification of cDNA ends based on template-switching effect and step-out PCR”, Nucleic Acids Res 1999 Mar. 15; 27(ó):1558-60). Alternatively, labeled probes prepared from the original cDNA clones can be used to screen cDNA and genomic libraries to recover the missing coding exons (Liu M, Subramanyam Y V, Baskaran N, “Preparation and analysis of cDNA from a small number of hematopoietic cells”, Methods Enzymol 1999; 303:45-55; Carninci P, Hayashizaki Y, “High-efficiency fulllength cDNA cloning”, Methods Enzymol 1999; 303:19-44). cDNA libraries to be screened can be preselected for those that contain cDNA copies of the transcripts of interest by PCR using primers complementary to the existing cDNA clones.
Despite the recent advances made in the field of human genetics, a large number of polynucleotides encoding receptor or other signaling proteins as well as proteins engaged in metabolic or structural roles necessary for normal cell functions and physiology have not yet been identified. Of those that have been isolated, many have been found to play a role in disease when this gene sequence is modified. In addition there is overwhelming evidence that polymorphism in genetic composition in determining the susceptibility to disease and in modifying the severity of disease manifestations. Due to the important role that these proteins play in maintaining the human condition, it is useful to identify and characterize novel human proteins and the polynucleotides that encode them. This information will enhance our ability to diagnose medical disorders that are influenced by gene expression, by giving the medical community the opportunity to ameliorate these conditions or even prevent them entirely through the development and administration of gene, protein and small molecule therapeutics designed to treat the cause and symptoms of disease.