The present invention relates to a composition comprising a plurality of polynucleotide sequences for use in research and diagnostic applications.
DNA-based arrays can provide a simple way to explore the expression of a single polymorphic gene or a large number of genes. When the expression of a single gene is explored, DNA-based arrays are employed to detect the expression of specific gene variants. For example, a p53 tumor suppressor gene array may be used to determine whether individuals are carrying mutations that predispose them to cancer. The array has over 50,000 DNA targets to analyze more than 400 distinct mutations of p53. A cytochrome p450 gene array is useful to determine whether individuals have one of a number of specific mutations that could result in increased drug metabolism, drug resistance, or drug toxicity.
DNA-based array technology is especially relevant to screen expression of a large number of genes rapidly. There is a growing awareness that gene expression is affected in a global fashion and that genetic predisposition, disease, or therapeutic treatment may affect, directly or indirectly, the expression of a large number of genes. In some cases the interactions may be expected, such as where the genes are part of the same signaling pathway. In other cases, such as when some of the genes participate in separate signaling pathways, the interactions may be totally unexpected. Therefore, DNA-based arrays can be used to investigate how genetic predisposition, disease, or therapeutic treatment affect the coregulation and expression of a large number of genes.
It would be advantageous to prepare DNA-based arrays that can be used for monitoring the expression of a large number of membrane-associated proteins. Proteins which span or are associated with cell membranes include receptors, ion channels and symporters, cytokines and their suppressors, monomeric or heterotrimeric G- and ras-related proteins, lectins such as selectin, oncogenes and their suppressors, and the like. Receptors include G protein coupled, four transmembrane, and tyrosine kinase receptors. Some of these proteins may span a cellular membrane and some may be secreted. The secreted proteins typically include signal sequences that direct them to their final cellular or extracellular destination.
The present invention provides for a composition comprising a plurality of polynucleotide sequences for use in detecting changes in expression of a large number of genes encoding proteins which are membrane-associated proteins, receptors and ion channels. Such a composition can be employed for the diagnosis or treatment of any diseasexe2x80x94a pancreatic disease, a cancer, an immunopathology, a neuropathology and the likexe2x80x94where a defect in the expression of a gene encoding membrane-associated proteins is involved.
In one aspect, the present invention provides a composition comprising a plurality of polynucleotide sequences, wherein each of said polynucleotide sequences comprises at least a fragment of a gene encoding membrane-associated proteins, receptors and ion channels.
In one preferred embodiment, the plurality of polynucleotide sequences comprises at least a fragment of one or more of the sequences, SEQ ID NOs:1-305, presented in the Sequence Listing. In a second preferred embodiment, the composition comprises a plurality of polynucleotide sequences comprising at least a fragment of a gene encoding a membrane-associated protein. In a third preferred embodiment, the composition comprises a plurality of polynucleotide sequences comprising at least a fragment of a gene encoding a receptor. In a fourth preferred embodiment, the composition comprises a plurality of polynucleotide sequences comprising at least a fragment of a gene encoding ion channels. In a fifth preferred embodiment, the composition comprises a plurality of polynucleotide sequences comprising at least a fragment of at least one or more of the sequences of SEQ ID NOs:1-288. In a sixth preferred embodiment, the composition comprises a plurality of polynucleotide sequences comprising at least a fragment of at least one or more of the sequences of SEQ ID NOs:289-294. In a seventh preferred embodiment, the composition comprises a plurality of polynucleotide sequences comprising at least a fragment of at least one or more of the sequences of SEQ ID NOs:295-305. In one aspect, the fragment is selected from the group consisting of SEQ ID NOs:295-297, or SEQ ID NOs:298-305. In an eighth preferred embodiment, the composition is a polynucleotide probe. In one aspect, the composition is immobilized on a substrate. In a ninth preferred embodiment, the composition is an hybridizable array element.
The composition, a hybridizable array element, is useful to monitor the expression of a plurality of expressed polynucleotides. The microarray is used in the diagnosis and treatment of a pancreatic disease, a cancer, an immunopathology, a neuropathology, and the like.
In another aspect, the present invention provides an expression profile that can reflect the expression levels of a plurality of polynucleotide sequences in a sample. The expression profile comprises a microarray and a plurality of detectable complexes. Each detectable complex is formed by hybridization of at least one probe polynucleotide sequence to at least one target polynucleotide sequence and further comprises a labeling moiety for detection.
A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
The Sequence Listing is a compilation of nucleotide sequences obtained by sequencing and assembling clone inserts (isolates) from various cDNA libraries. Each sequence is identified by a sequence identification number (SEQ ID NO:) and by clone number.
FIGS 1A and 1B are an alignment of SEQ ID NOs:298-302 produced using GELVIEW Fragment Assembly System software (Genetics Computer Group (GCG), Madison Wis.).
FIGS. 2A and 2B are an alignment of SEQ ID NOs:303-305 produced using GELVIEW Fragment Assembly System software (GCG).
Table 1 is a list of the sequences disclosed herein. By column, the table contains: 1) SEQ ID NO: as shown in the Sequence Listing; 2) Incyte Clone NO; 3) PRINT ID, designation of the relevant PROSITE group; 4) PRINT DESCRIPTION; 5) PRINT STRENGTH, the degree of correlation to the PROSITE group,  greater than 1300 is strong and 1000 to 1300 is weak; 6) PRINT SCORE, where  greater than 1300 is strong and 1000 to 1300 is suggestive; 7) TM, the presence of at least one transmembrane domain; and 8) SIGNAL PEPTIDE, the presence of a signal peptide. The table is arranged so that SEQ ID NOs:1-305 contain at least a fragment of a gene encoding a membrane-associated protein, some of which are receptors, and some, ion channels.
The term xe2x80x9cmicroarrayxe2x80x9d refers to an ordered arrangement of hybridizable array elements. The elements are arranged so that there are preferably at least one or more different elements, more preferably at least 100 elements, even more preferably at least 1,000 elements, and most preferably at least 10,000 elements on a one cm2 substrate surface. The maximum number of array elements is unlimited, but is at least 100,000. Furthermore, the hybridization signal from each array element is individually distinguishable. In a preferred embodiment, the array elements comprise polynucleotide sequences.
A xe2x80x9cpolynucleotidexe2x80x9d refers to a chain of nucleotides. Preferably, the chain has from about five to 10,000 nucleotides, more preferably from about 50 to 3,500 nucleotides. The term xe2x80x9cprobexe2x80x9d refers to a polynucleotide sequence capable of hybridizing with a target sequence to form a polynucleotide probe/target complex under hybridization conditions. A xe2x80x9ctarget polynucleotidexe2x80x9d refers to a chain of nucleotides to which a polynucleotide probe can hybridize by base pairing. In some instances, the sequences will be completely complementary (no mismatches) when aligned; in others, there may be up to a 10% mismatch.
A xe2x80x9cpluralityxe2x80x9d refers preferably to a group of at least one or more members, more preferably to a group of at least about 100, even more preferably to a group of at least about 1,000 members, and most preferably to a group of at least about 10,000 members. The maximum number of members is unlimited, but is at least about 100,000 members.
A xe2x80x9cfragmentxe2x80x9d means a stretch of at least about 100 consecutive nucleotides. A xe2x80x9cfragmentxe2x80x9d can also mean a stretch of at least about 100 consecutive nucleotides that contains one or more deletions, insertions or substitutions. A xe2x80x9cfragmentxe2x80x9d can also include the entire open reading frame of a gene. Preferred fragments are those that lack secondary structure as identified by using computer software programs such as OLIGO 4.06 Primer Analysis software (National Biosciences, Plymouth Minn.), LASERGENE software (DNASTAR, Madison Wis.), MACDNASIS (Hitachi Software Engineering Co., Ltd., San Bruno Calif.) and the like.
The term xe2x80x9cgenexe2x80x9d or xe2x80x9cgenesxe2x80x9d refers to polynucleotide sequence which may be the partial or complete and may comprise regulatory, untranslated, or coding regions. The phrase xe2x80x9cgenes encoding membrane-associated proteins, receptors, or ion channelsxe2x80x9d refers to genes comprising sequences that contain conserved protein motifs or domains that were identified by BLAST (Basic Local Alignment Search Tool; Altschul (1993) J Mol Evol 36:290-300; and Altschul et al. (1990) J Mol Biol 215:403-410), PRINTS, or other analytical tools. Additionally, xe2x80x9cgenes encoding membrane-associated proteins, receptors, or ion channelsxe2x80x9d refers to genes which may produce proteins which span the cell membrane or have signal sequences which direct them to their final cellular or extracellular destination.
The present invention provides a composition comprising a plurality of polynucleotide sequences comprising at least a fragment of a gene encoding a protein which is a receptor, ion channel, or associated with cell membrane. Preferably, the plurality of polynucleotide sequences comprise at least a fragment of one or more of the sequences (SEQ ID NOs:1-305) presented in the Sequence Listing. In one preferred embodiment, the composition comprises a plurality of polynucleotide sequences, wherein each sequence comprises at least a fragment of a sequence selected from the group consisting of SEQ ID NOs:1-294. In a second preferred embodiment, the composition comprises a plurality of polynucleotide sequences, wherein each sequence comprises at least a fragment of a sequence selected from the group consisting of SEQ ID NOs:295-305.
A microarray can be used for large scale genetic or gene expression analysis of a large number of polynucleotide sequences. Such an analysis can be used in the diagnosis of diseases and in the monitoring of treatments where altered expression of genes encoding receptors, ion channels, or membrane-associated proteins cause disease, such as pancreatic disease, cancer, an immunopathology, neuropathology, and the like. Further, the microarray can be employed to investigate an individual""s predisposition to a disease, such as pancreatic disease, cancer, an immunopathology, or a neuropathology. Furthermore, the microarray can be employed to investigate cellular responses to infection, drug treatment, and the like.
When the composition of the invention is employed as hybridizable array elements in a microarray, the array elements are organized in an ordered fashion so that each element is present at a specified location on the substrate. Because the array elements are at specified locations on the substrate, the hybridization patterns and intensities (which together create a unique expression profile) can be interpreted in terms of expression levels of particular genes and can be correlated with a particular disease or condition or treatment.
The composition comprising a plurality of polynucleotide sequences can also be used to purify a subpopulation of mRNAs, cDNAs, genomic fragments, and the like, in a sample. Typically, samples will include polynucleotides of interest and additional nucleic acids which may contribute to background signal in a hybridization. Therefore, it may be advantageous to remove these additional nucleic acids before hybridization. One method for removing the additional nucleic acids is to hybridize the sample containing probe polynucleotides with immobilized polynucleotide targets. Those nucleic acids which do not hybridize to the polynucleotide targets are washed away. At a later point, the immobilized target polynucleotides can be released in the form of purified target polynucleotides.
Method for Selecting Polynucleotide Sequences
This section describes the selection of the plurality of polynucleotide sequences. In one embodiment, the sequences are selected based on the presence of shared signal sequence motifs. For example, signal sequences generally contain 15 to 60 amino acids and are located at the N-terminal end of the protein. The signal sequence consists of three regions: 1) an n-region located adjacent to the N-terminus which is composed of one to five amino acids and usually carries a positive charge, 2) the h-region which is composed of 7 to 15 hydrophobic amino acids and creates a hydrophobic core; and 3) the c region which is located between the h-region and the cleavage site and is composed of three to seven polar, but mostly uncharged, amino acids. The signal sequence is removed from the protein during posttranslational processing by cleavage at the cleavage site.
A transmembrane protein is characterized by a polypeptide chain which is exposed on both sides of a membrane. The cytoplasmic and extracellular domains are separated by at least one membrane-spanning segment which traverses the hydrophobic environment of the lipid bilayer. The membrane-spanning segment is composed of amino acid residues with nonpolar side chains, usually in the form of an xcex1 helix. Segments which contain about 20-30 hydrophobic residues are long enough to span a membrane as an xcex1 helix, and they can often be identified by means of a hydropathy plot.
Receptor sequences are recognized by one or more hydrophobic transmembrane regions, cysteine disulfide bridges between extracellular loops, an extracellular N-terminus, and a cytoplasmic C-terminus. For example, in G protein-coupled receptors (GPCRs), the N-terminus interacts with ligands, the disulfide bridge interacts with agonists and antagonists, the second cytoplasmic loop has a conserved, acidic-Arg-aromatic triplet which may interact with the G proteins, and the large third intracellular loop interacts with G proteins to activate second messengers such as cyclic AMP, phospholipase C, inositol triphosphate, or ion channel proteins (Watson and Arkinstall (1994) The G-protein Linked Receptor Facts Book, Academic Press, San Diego Calif.). Other exemplary classes of receptors such as the tetraspanins (Maecker et al. (1997) FASEB J 11:428-442), calcium dependent receptors (Speiss (1990) Biochem 29:10009-18) and the single transmembrane receptors may be similarly characterized relative to their intracellular and extracellular domains, known motifs, and interactions with other molecules.
An ion channel is a transmembrane protein that forms a hydrophilic pore through which ions can cross the lipid bilayer of the membrane. An ion channel usually shows some degree of ion specificity, and up to a million ions per second may flow down their electrochemical gradients through the open pore. Ion channels are gated and allow ions to pass only under defined circumstances. Gated channels may be either voltage-gated, such as the sodium channel of neurons, or ligand-gated, such as the acetylcholine receptor of cholinergic synapses.
Membrane-associated proteins, receptors or ion channels may act directly as inhibitors or as stimulators of cell proliferation, growth, attachment, angiogenesis, and apoptosis, or indirectly by modulating the effects of transcription factors, matrix and adhesion molecules, cell cycle regulators, and other molecules in cell signaling pathways. In addition, cell signaling molecules may act as ligands or ligand cofactors for receptors which modulate cell growth, proliferation, and differentiation. These molecules may be identified by sequence homology to molecules whose function has been characterized, and by the identification of their conserved domains. Membrane-associated proteins, receptors or ion channels may be characterized using programs such as BLAST, PRINTS, or Hidden Markov Models (HMM). Fragments which include characterized, conserved regions of membrane-associated proteins, receptors, or ion channels may be used in hybridization technologies to identify similar proteins.
A large number of clones from a variety of cDNA libraries can be screened using software well known in the art to discover sequences with conserved protein domains or motifs. Such sequences may be screened using the BLOCK 2 Bioanalysis program (Incyte Pharmaceuticals, Palo Alto Calif.), a motif analysis program based on sequence information contained in the SWISSPROT and PROSITE databases, which is useful for determining the function of uncharacterized proteins translated from genomic or cDNA sequences (Bairoch et al. (1997) Nucleic Acids Res 25:217-221; Attwood et al. (1997) J Chem Inf Comput Sci 37:417-424). PROSITE is particularly useful to identify functional or structural domains that cannot be detected using common motifs because of extreme sequence divergence. The method, which is based on weight matrices, calibrates the motifs against the SWISS-PROT database to obtain a measure of the chance distribution of the matches. Similarly, databases such as PRINTS store conserved motifs useful in the characterization of proteins (Attwood et al.(1998) Nucl Acids Res 26:304-308). These conserved motifs are used in the selection and design of probes. The PRINTS database can be searched using the BLIMPS search program. The PRINTS database of protein family xe2x80x9cfingerprintsxe2x80x9d complements the PROSITE database and utilizes groups of conserved motifs within sequence alignments to build characteristic signatures of different polypeptide families. Alternatively, HMMs can be used to find shared motifs, specifically consensus sequences (Pearson and Lipman (1988) Proc Natl Acad Sci 85:2444-2448; Smith and Waterman (1981) J Mol Biol 147:195-197). Although HMMs were initially developed to examine speech recognition patterns, they have been used in biology to analyze protein and DNA sequences and to model protein structure. HMMs have a formal probabilistic basis and use position-specific scores for amino acids or nucleotides. The algorithms are flexible in that they incorporate information from newly identified sequences to build even more successful patterns. HMMs are useful to identify the transmembrane regions and signal peptides.
In another embodiment, the sequences disclosed in the Sequence Listing can be searched against GenBank and SWISSPROT databases using BLAST. Then, the descriptions of those sequences with homology to the disclosed sequences may be scanned using keywords such as receptor, transmembrane, receptor, channel, oncogene, inhibitor, and the like.
Sequences identified by the methods described above are provided in SEQ ID NOs:1-305 in the Sequence Listing. Table 1 provides the annotation to the referenced PRINTS sequences and specifies whether they possess transmembrane and signal peptide motifs. The resulting composition can comprise polynucleotide sequences that are not redundant, i.e., there is no more than one polynucleotide sequence to represent a particular gene. Alternatively, the composition can contain polynucleotide probes or microarray elements that are redundant, i.e., a gene is represented by more than one polynucleotide sequence.
The selected polynucleotide sequences may be manipulated further to optimize their performance as hybridization probes. To optimize probe selection, the sequences are examined using a computer algorithms, which are well known in the art, to identify fragments of genes without potential secondary structure. Such computer algorithms are found in OLIGO 4.06 Primer Analysis software (National Biosciences) or LASERGENE software (DNASTAR). These programs can search nucleotide sequences to identify stem loop structures and tandem repeats and to analyze G+C content of the sequence (those sequences with a G+C content greater than 60% are excluded). Alternatively, the probes can be optimized by trial and error. Experiments can be performed to determine whether the probes hybridize optimally to target sequences under experimental conditions.
Where the greatest numbers of different polynucleotide sequences are desired, the sequences are extended to assure that different polynucleotide sequences are not derived from the same gene, i.e., the polynucleotide sequences are not redundant. The probe sequences may be extended utilizing the partial nucleotide sequences derived from clone isolates by employing methods well known in the art. For example, one method which may be employed, xe2x80x9crestriction-sitexe2x80x9d PCR, uses universal primers to retrieve unknown sequence adjacent to a known locus (Sarkar (1993) PCR Methods Applic 2: 318-322).
Polynucleotide Sequences
This section describes the polynucleotide sequences. The polynucleotide sequences can be genomic DNA, cDNA, mRNA, or any RNA-like or DNA-like material, such as peptide nucleic acids, branched DNAs, and the like. The polynucleotide sequences can be sense or antisense, complementary sequences. Where target polynucleotides are double stranded, the probes may be either sense or antisense strands. Where the target polynucleotides are single stranded, the probes are complementary single strands. In one embodiment, the polynucleotide sequences are cDNAs, the size of which may vary, and are preferably from 1000 to 10,000 nucleotides, more preferably from 150 to 5000 nucleotides. In a second embodiment, the polynucleotide sequences are contained within plasmids. In this case, the size of the inserted cDNA sequence, excluding the vector DNA and its regulatory sequences, may vary from about 50 to 12,000 nucleotides, more preferably from about 150 to 5000 nucleotides.
The polynucleotide can be prepared by a variety of synthetic or enzymatic schemes which are well known in the art. Sequences can be synthesized, in whole or in part, using chemical or enzymatic methods well known in the art (Caruthers et al. (1980) Nucl Acids Symp Ser (7)215-233; Ausubel et al. (1997) Short Protocols in Molecular Biology, John Wiley and Sons, New York N.Y.).
Nucleotide analogues, which can base pair with the target nucleotide sequences, can be incorporated into the probe sequences by methods well known in the art. For example, certain guanine nucleotides can be substituted with hypoxanthine which hydrogen bonds with cytosine, but these bonds are less stable than those formed between guanine and cytosine. Alternatively, adenine nucleotides can be substituted with 2,6-diaminopurine which forms stronger bonds with thymidine than those between adenine and thymidine. Additionally, the polynucleotide sequences can include nucleotides that have been derivatized chemically or enzymatically. Typical chemical modifications include derivatization with acyl, alkyl, aryl or amino groups.
The polynucleotide sequences can be immobilized on a substrate. Preferred substrates are any suitable rigid or semi-rigid support including membranes, filters, chips, slides, wafers, fibers, magnetic or nonmagnetic beads, gels, tubing, plates, polymers, microparticles and capillaries. The substrate can have a variety of surface forms, such as wells, trenches, pins, channels and pores, to which the polynucleotide sequences are bound. Preferably, the substrates are optically transparent.
Sequences can be synthesized, in whole or in part, on the surface of a substrate using a chemical coupling procedure and a piezoelectric printing apparatus, such as that described in PCT publication WO95/251116 (Baldeschweiler et al.). Alternatively, the target can be synthesized on a substrate surface using a self-addressable electronic device that controls when reagents are added (Heller et al. U.S. Pat. No. 5,605,662).
Complementary DNA (cDNA) can be arranged and immobilized on a substrate. The sequences can be immobilized by covalent means such as by chemical bonding procedures or UV. In one such method, a cDNA is bound to a glass surface which has been modified to contain epoxide or aldehyde groups. In another case, a cDNA target is placed on a polylysine coated surface and UV cross-linked (Shalon et al. WO95/35505). In yet another method, a DNA is actively transported from a solution to a given position on a substrate by electrical means (U.S. Pat. No. 5,605,662). Alternatively, individual DNA clones can be gridded on a filter. Cells are lysed, proteins and cellular components degraded, and the DNA coupled to the filter by UV cross-linking.
Furthermore, the sequences do not have to be directly bound to the substrate, but rather can be bound to the substrate through a linker group. The linker groups are typically about 6 to 50 atoms long, and they provide exposure to the attached polynucleotide sequence. Preferred linker groups include ethylene glycol oligomers, diamines, diacids and the like. Reactive groups on the substrate surface react with one of the terminal portions of the linker to bind the linker to the substrate. The other terminal portion of the linker is adapted to bind the polynucleotide sequence.
The polynucleotide sequences can be attached to a substrate by dispensing reagents for target synthesis on the substrate surface or by dispensing preformed DNA fragments or clones on the substrate surface. Typical dispensers include a micropipette delivering solution to the substrate with a robotic system to control the position of the micropipette with respect to the substrate. There can be a multiplicity of dispensers so that reagents can be delivered to the reaction regions simultaneously.
Sample Preparation
In order to conduct sample analysis, a sample containing nucleic acids is provided. The samples can be obtained from any bodily fluid (blood, urine, saliva, phlegm, gastric juices, etc.), cultured cells, biopsies, or other tissue preparations. DNA or RNA can be isolated from the sample according to any of a number of methods well known to those of skill in the art. For example, methods of purification of nucleic acids are described in Tijssen (1993; Laboratory Techniques in Biochemistry and Molecular Biology: Hybridization With Nucleic Acid Probes, Part I. Theory and Nucleic Acid Preparation, Elsevier Science, New York N.Y.). In one case, total RNA is isolated using the TRIZOL reagent (Life Technologies, Gaithersburg Md.), and mRNA is isolated using oligo d(T) column chromatography or glass beads. Alternatively, when probe polynucleotides are derived from an mRNA, the probe polynucleotides can be DNA reverse transcribed from the mRNA, an RNA transcribed from that cDNA, a DNA amplified from that DNA, an RNA transcribed from the amplified DNA, and the like. When the target polynucleotide is derived from cDNA, the target polynucleotide can be DNA amplified from DNA or DNA reverse transcribed from RNA. In yet another alternative, the polynucleotide sequences are prepared by more than one method.
When polynucleotide sequences are amplified, it is desirable to amplify the nucleic acid sample and maintain the relative abundances represented in the original sample including low abundance transcripts. Total mRNA can be amplified by reverse transcription using a reverse transcriptase and a primer consisting of oligo d(T) and a sequence encoding the phage T7 promoter to provide a single stranded DNA template. The second DNA strand is polymerized using a DNA polymerase and a RNAse which assists in breaking up the DNA/RNA hybrid. After synthesis of the double stranded DNA, T7 RNA polymerase can be added, and RNA transcribed from the second DNA strand template (Van Gelder et al. U.S. Pat. No. 5,545,522). RNA can be amplified in vitro, in situ or in vivo (Eberwine U.S. Pat. No. 5,514,545).
It is also advantageous to include quantitation controls within the sample to assure that amplification and labeling procedures do not change the true distribution of probe polynucleotides in a sample. For this purpose, a sample is spiked with a known amount of a control probe polynucleotide and the composition of target polynucleotide sequences includes reference target sequences which specifically hybridize with the control probe polynucleotides. After hybridization and processing, the hybridization signals obtained should reflect accurately the amount of control probe polynucleotides added to the sample.
Prior to hybridization, it may be desirable to fragment the probe polynucleotides. Fragmentation improves hybridization by minimizing secondary structure and cross-hybridization to polynucleotides in the sample with low or no complementarity. Fragmentation can be performed by mechanical or chemical means.
The probe polynucleotides may be labeled with one or more labeling moieties to allow for detection of hybridized probe/target polynucleotide complexes. The labeling moieties can include compositions that can be detected by spectroscopic, photochemical, biochemical, bioelectronic, immunochemical, electrical, optical or chemical means. The labeling moieties include radioisotopes, such as 32P, 33P or 35S, chemiluminescent compounds, labeled binding proteins, heavy metal atoms, spectroscopic markers, such as fluorescent markers and dyes, magnetic labels, linked enzymes, mass spectrometry tags, spin labels, electron transfer donors and acceptors, and the like.
Exemplary dyes include quinoline dyes, triarylmethane dyes, phthaleins, azo dyes, cyanine dyes and the like. Preferably, fluorescent markers absorb light above about 300 nm, preferably above 400 nm, and usually emit light at wavelengths at least greater than 10 nm above the wavelength of the light absorbed. Preferred fluorescent markers include fluorescein, phycoerythrin, rhodamine, lissamine, and Cy3 and Cy5 (Amersham Pharmacia Biotech, Piscataway N.J.).
Labeling can be carried out during an amplification reaction, such as polymerase chain and in vitro transcription reactions, or by nick translation or 5xe2x80x2 or 3xe2x80x2-end-labeling reactions. In one case, labeled nucleotides are used in an in vitro transcription reaction. When the label is incorporated after or without an amplification step, the label is incorporated by using terminal transferase or by kinasing the 5xe2x80x2 end of the polynucleotide sequence and then incubating overnight with a labeled oligonucleotide in the presence of T4 RNA ligase.
Alternatively, the labeling moiety can be incorporated after hybridization once a probe/target complex has formed. In one case, biotin is first incorporated during an amplification step as described above. After the hybridization reaction, unbound nucleic acids are rinsed away so that the only biotin present is attached to probe polynucleotides complexed with the target polynucleotides. An avidin-conjugated fluorophore, such as avidin-phycoerythrin, that binds with high affinity to biotin is added. In another case, the labeling moiety is incorporated by intercalation into bound probe/target complexes. In this case, an intercalating dye such as a psoralen-linked dye can be employed.
Under some circumstances it may be advantageous to immobilize the probe polynucleotides on a substrate and have the polynucleotide targets bind to the immobilized probe polynucleotides. In such cases the probe polynucleotides can be attached to a substrate as described above.
Hybridization and Detection
Hybridization causes a denatured polynucleotide probe and a denatured complementary target to form a stable duplex through base pairing. Hybridization methods are well known to those skilled in the art. (See, e.g., Ausubel, supra, units 2.8-2.11, 3.18-3.19 and 4-6-4.9.) Conditions can be selected for hybridization where completely complementary probe and target can hybridize, i.e., each base pair must interact with its complementary base pair. Alternatively, conditions can be selected where probe and target have mismatches but are still able to hybridize. Suitable conditions can be selected, for example, by varying the concentrations of salt in the prehybridization, hybridization, and wash solutions or by varying the hybridization and wash temperatures. With some membranes, the temperature can be decreased by adding formamide to the prehybridization and hybridization solutions.
Hybridization can be performed at low stringency with buffers, such as 5xc3x97SSC with 1% sodium dodecyl sulfate (SDS) at 60xc2x0 C., which permits hybridization between probe and target sequences that contain some mismatches to form probe/target complexes. Subsequent washes are performed at higher stringency with buffers such as 0.2xc3x97SSC with 0.1% SDS at either 45xc2x0 C. (medium stringency) or 68xc2x0 C. (high stringency), to maintain hybridization of only those probe/target complexes that contain completely complementary sequences. Background signals can be reduced by the use of detergents such as SDS, Sarcosyl, or Triton X-100, or a blocking agent, such as salmon sperm DNA.
Hybridization specificity can be evaluated by comparing the hybridization of control probe sequences to control target sequences that are added to a sample in a known amount. The control probe may have one or more sequence mismatches compared with the corresponding control target. In this manner, it is possible to evaluate whether only complementary probes are hybridizing to the targets or whether mismatched hybrid duplexes are forming.
Hybridization reactions can be performed in absolute or differential hybridization formats. In the absolute hybridization format, probe polynucleotides from one sample are hybridized to microarray elements, and signals detected after hybridization complexes form. Signal strength correlates with probe polynucleotide levels in a sample. In the differential hybridization format, differential expression of a set of genes in two biological samples is analyzed. Probe polynucleotides from the two samples are prepared and labeled with different labeling moieties. A mixture of the two labeled probe polynucleotides is hybridized to the microarray elements, and signals are examined under conditions in which the emissions from the two different labels are individually detectable. Targets in the microarray that are hybridized to substantially equal numbers of probes derived from both biological samples give a distinct combined fluorescence (Shalon WO95/35505). In a preferred embodiment, the labels are fluorescent labels with distinguishable emission spectra, such as a lissamine conjugated nucleotide analog and a fluorescein conjugated nucleotide analog. In another embodiment Cy3/Cy5 fluorophores (Amersham Pharmacia Biotech) are employed.
After hybridization, the microarray is washed to remove nonhybridized nucleic acids, and complex formation between the hybridizable array elements and the probe polynucleotides is examined. Methods for detecting complex formation are well known to those skilled in the art. In a preferred embodiment, the probe polynucleotides are labeled with a fluorescent label, and measurement of levels and patterns of fluorescence indicative of complex formation is accomplished by fluorescence microscopy, preferably confocal fluorescence microscopy. An argon ion laser excites the fluorescent label, emissions are directed to a photomultiplier, and the amount of emitted light is detected and quantitated. The detected signal should be proportional to the amount of probe/target polynucleotide complexes at each position of the microarray. The fluorescence microscope can be associated with a computer-driven scanner device to generate a quantitative two-dimensional image of hybridization intensity. The scanned image is examined to determine the abundance/expression level of each hybridized probe polynucleotide.
Typically, microarray fluorescence intensities can be normalized to take into account variations in hybridization intensities when more than one microarray is used under similar test conditions. In a preferred embodiment, individual polynucleotide probe/target complex hybridization intensities are normalized using the intensities derived from internal normalization controls contained on each microarray.
Expression Profiles
Expression profiles using the composition of this invention may be used to detect changes in the expression of genes implicated in disease. These genes include genes whose altered expression is correlated with pancreatic disease, cancer, immunopathology, neuropathology, and the like.
The expression profile comprises the polynucleotide sequences of the Sequence Listing. The expression profile also includes a plurality of detectable complexes. Each complex is formed by hybridization of one or more polynucleotide sequences or array elements to one or more complementary probe polynucleotides. At least one of the polynucleotide sequences, preferably a plurality of polynucleotide sequences, is hybridized to a complementary target polynucleotide forming at least one, and preferably a plurality, of complexes. A complex is detected by the incorporation of at least one labeling moiety, described above, in the complex. Expression profiles provide xe2x80x9csnapshotsxe2x80x9d that reflect unique expression patterns that are characteristic of a disease or condition.
After performing hybridization experiments and interpreting the signals produced by complexes on a microarray, particular polynucleotide sequences can be identified based on their expression patterns. Such polynucleotide sequences can be used to clone a full length sequence for the gene, to produce a polypeptide, to develop a diagnostic panel for a particular disease, to choose a gene for potential therapeutic use, and the like.
Additional Utility of the Invention
Microarrays containing the sequences of the Sequence Listing can be employed in several applications including diagnostics, prognostics and treatment regimens, drug discovery and development, toxicological and carcinogenicity studies, forensics, pharmacogenomics and the like. In one situation, the microarray is used to monitor the progression of disease. Researchers can assess and catalog the differences in gene expression between healthy and diseased tissues or cells. By analyzing changes in patterns of gene expression, disease can be diagnosed at earlier stages before the patient is symptomatic. The invention can also be used to monitor the efficacy of treatment. For some treatments with known side effects, the microarray is employed to xe2x80x9cfine tunexe2x80x9d the treatment regimen. A dosage is established that causes a change in genetic expression patterns indicative of successful treatment. Expression patterns associated with undesirable side effects are avoided. This approach may be more sensitive and rapid than waiting for the patient to show inadequate improvement, or to manifest side effects, before altering the course of treatment.
Alternatively, animal models which mimic a disease can be used, rather than patients, to characterize expression profiles associated with a particular disease or condition. This gene expression data may be useful in diagnosing and monitoring the course of disease in the model, in determining gene that are candidates for intervention, and in testing novel treatment regimens. Subsequently, the expression profile following protocols and treatments successful in the model system may be used on and monitored in human patients.
The expression of genes encoding membrane-associated proteins, receptors, and ion channels was highly associated with pancreatic tissue; xcx9c45% of the sequences of the Sequence Listing were expressed in pancreatic tissues. In particular, the microarray and expression profile is useful to diagnose a conditions of the pancreas such as diabetes, pancreatitus, pancreatic cholera, hyperlipidemia, fibrocystic disease, and cancers and tumors of the pancreas.
The expression of genes encoding membrane-associated proteins, receptors, and ion channels is closely associated with immune conditions, disorders and diseases; xcx9c20% of the sequences of the Sequence Listing were expressed in tissues from patients with immunological conditions such as AIDS, Addison""s disease, ARDS, allergies, ankylosing spondylitis, amyloidosis, anemia, asthma, atherosclerosis, autoimmune hemolytic anemia, autoimmune thyroiditis, bronchitis, cholecystitis, contact dermatitis, Crohn""s disease, atopic dermatitis, dermatomyositis, diabetes mellitus, emphysema, erythroblastosis fetalis, erythema nodosum, atrophic gastritis, glomerulonephritis, Goodpasture""s syndrome, gout, Graves"" disease, Hashimoto""s thyroiditis, hypereosinophilia, irritable bowel syndrome, multiple sclerosis, myasthenia gravis, myocardial or pericardial inflammation, osteoarthritis, osteoporosis, pancreatitis, polymyositis, psoriasis, Reiter""s syndrome, rheumatoid arthritis, scleroderma, Sjxc3x6gren""s syndrome, systemic anaphylaxis, systemic lupus erythematosus, systemic sclerosis, thrombocytopenic purpura, ulcerative colitis, uveitis, Werner syndrome, complications of cancer, hemodialysis, and extracorporeal circulation, viral, bacterial, fungal, parasitic, protozoal, and helminthic infections, and trauma.
The expression of genes encoding membrane-associated proteins, receptors, and ion channels is closely associated with cancers; xcx9c10% of the sequences of the Sequence Listing were expressed in cancerous tissues. In particular, the microarray and expression profile is useful to diagnose a cancer such as adenocarcinoma, leukemia, lymphoma, melanoma, myeloma, sarcoma and teratocarcinoma. Such cancers include, but are not limited to, cancers of the adrenal gland, bladder, bone, bone marrow, brain, breast, cervix, colon, gall bladder, ganglia, gastrointestinal tract, heart, kidney, liver, lung, muscle, ovary, pancreas, parathyroid, penis, prostate, salivary glands, skin, spleen, testis, thymus, thyroid and uterus.
The expression of genes encoding membrane-associated proteins, receptors, and ion channels is also closely associated with the immune response. Therefore, the microarray can be used to diagnose immunopathologies including, but not limited to, AIDS, Addison""s disease, adult respiratory distress syndrome, allergies, anemia, asthma, atherosclerosis, bronchitis, cholecystitus, Crohn""s disease, ulcerative colitis, atopic dermatitis, dermatomyositis, diabetes mellitus, emphysema, atrophic gastritis, glomerulonephritis, gout, Graves"" disease, hypereosinophilia, irritable bowel syndrome, lupus erythematosus, multiple sclerosis, myasthenia gravis, myocardial or pericardial inflammation, osteoarthritis, osteoporosis, pancreatitis, polymyositis, rheumatoid arthritis, scleroderma, Sjxc3x6gren""s syndrome, and autoimmune thyroiditis; complications of cancer, hemodialysis, extracorporeal circulation; viral, bacterial, fungal, parasitic, and protozoal infections; and trauma.
Neuropathologies are also effected by the expression of genes encoding membrane-associated proteins, receptors, and ion channels; in fact, xcx9c1% of the sequences of the Sequence Listing were expressed in neuronal tissues. Thus, the microarray can be used to diagnose neuropathologies including, but not limited to, akathesia, Alzheimer""s disease, amnesia, amyotrophic lateral sclerosis, bipolar disorder, catatonia, cerebral neoplasms, dementia, depression, Down""s syndrome, tardive dyskinesia, dystonias, epilepsy, Huntington""s disease, multiple sclerosis, neurofibromatosis, Parkinson""s disease, paranoid psychoses, schizophrenia, and Tourette""s disorder.
Also, researchers can use the microarray to rapidly screen large numbers of candidate drug molecules, looking for ones that produce an expression profile similar to those of known therapeutic drugs, with the expectation that molecules with the same expression profile will likely have similar therapeutic effects. Thus, the invention provides the means to determine the molecular mode of action of a drug. It is understood that this invention is not limited to the particular devices, machines, materials and methods described. Although preferred embodiments are described; devices, machines, materials and methods similar or equivalent to these embodiments may be used to practice the invention. The preferred embodiments are not intended to limit the scope of the invention which is limited only by the appended claims.
The singular forms xe2x80x9caxe2x80x9d, xe2x80x9canxe2x80x9d, and xe2x80x9cthexe2x80x9d include plural reference unless the context clearly dictates otherwise. All technical and scientific terms have the meanings commonly understood by one of ordinary skill in the art. All patents mentioned herein are incorporated by reference for the purpose of describing and disclosing the devices, machines, materials and methods which are presented and which might be used in connection with the invention. Nothing in the specification is to be construed as an admission that the invention is not entitled to antedate such disclosure by virtue of prior invention.