The present invention relates to a composition comprising a plurality of polynucleotide probes for use in research and diagnostic applications.
DNA-based arrays can provide a simple way to explore the expression of a single polymorphic gene or a large number of genes. When the expression of a single gene is explored, DNA-based arrays are employed to detect the expression of specific gene variants. For example, a p53 tumor suppressor gene array is used to determine whether individuals are carrying mutations that predispose them to cancer. The array has over 50,000 DNA probes to analyze more than 400 distinct mutations of p53. A cytochrome p450 gene array is useful to determine whether individuals have one of 18 known polymorphisms of two human cytochrome p450 genes. These polymorphisms can cause increased drug metabolism, drug resistance or drug toxicity.
DNA-based array technology is especially relevant to the rapid screening of expression of a large number of genes. There is a growing awareness that gene expression is affected in a global fashion. A genetic predisposition, disease or therapeutic treatment may affect, directly or indirectly, the expression of a large number of genes. In some cases the interactions may be expected, such as where the genes are part of the same signaling pathway. In other cases, such as when the genes participate in separate signaling pathways, the interactions may be totally unexpected. Therefore, DNA-based arrays can be used to investigate how genetic predisposition, disease, or therapeutic treatment affects the expression of a large number of genes.
It would be advantageous to prepare DNA-based arrays that can be used for monitoring the expression of a large number of genes coding for signaling pathway polypeptides, including different types of receptor, transducer and effector-like polypeptides. The present invention provides for a composition that can be employed in an array-format for detecting changes in expression of a large number of genes coding for different signaling pathway polypeptides.
In one aspect, the present invention provides a composition comprising a plurality of polynucleotide probes, wherein each of said polynucleotide probes comprises at least a portion of a gene coding for a signaling pathway polypeptide. The plurality of polynucleotide probes can comprise I) first polynucleotide probes, wherein each of said first polynucleotide probes comprises at least a portion of a gene coding for a receptor-like polypeptide; II) second polynucleotide probes, wherein each of said second polynucleotide probes comprises at least a portion of a gene coding for a transducing polypeptide; III) third polynucleotide probes, wherein each of said third polynucleotide probes comprises at least a portion of a gene coding for an effector-like polypeptide; or combinations thereof.
More particularly, in one preferred embodiment the composition comprises a plurality of polynucleotide probes wherein each gene coding for a signaling pathway polypeptide is at least a portion of a sequence selected from the group consisting of SEQ ID Nos: 1-1490. In a second preferred embodiment, the composition comprises a plurality of polynucleotide probes comprising at least a portion of at least 1000 of the sequences of SEQ ID Nos: 1-1490. In a third preferred embodiment, the composition comprises a plurality of polynucleotide probes wherein said polynucleotide probes comprise at least a portion of substantially all the sequences of SEQ ID Nos: 1-1490. The polynucleotide probes can be complementary DNAs, clone DNAs and the like.
The composition is particularly useful as hybridizable array elements in a microarray for monitoring the expression of a plurality of target polynucleotides. The microarray comprises a substrate and hybridizable array elements. The microarray of this invention is particularly useful in the diagnosis and treatment of cancer, an immunopathology, a neuropathology and the like.
In another aspect, the present invention encompasses an expression profile that can reflect the levels of a plurality of target polynucleotides in a sample. The expression profile comprises the microarray and a plurality of detectable complexes. Each detectable complex is formed by having at least one of the target polynucleotides hybridizing to at least one of the hybridizable array elements and further comprises a labeling moiety for detection. The expression profile of this invention is particularly useful in the diagnosis and the treatment of cancer, an immunopathology, a neuropathology and the like.
In yet another aspect, the invention provides a method for selecting a plurality of polynucleotide probes, said method comprising (I) obtaining a plurality of query sequences; (II) screening said query sequences against one or more databases comprising annotated sequences to identify sequence hits; and (III) selecting said sequence hits with the highest homology (top hits) to said annotated sequences. The query sequences can be expression sequence tags (ESTs) or full length gene coding sequences, which are electronically screened using preferably the Basic Local Alignment Search Tool (BLAST) algorithm. In one embodiment, the highest homology is identified as a BLAST score equal to or above 100 at a P-value equal to or below 10xe2x88x9210 against the GenPept database. In a second embodiment, the highest homology is identified as a percent sequence identity equal to or above 80% and a BLAST score equal to or above 250 against the GenBank Primate database. In a third embodiment, the highest homology is identified as a percent identity equal to or above 75% and a BLAST score equal to or above 250 against the GenBank Rodent database. In a fourth embodiment, the highest homology is identified as the match with the lowest P-value when searches are performed against GenPept, GenBank Primate or GenBank Rodent databases.
A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
The Sequence Listing is a compilation of nucleotide sequences obtained by sequencing clone inserts (isolates) of different cDNA libraries. Each sequence is identified by a sequence identification number (SEQ ID No:), by the clone number from which it was obtained and by the cDNA library from which the sequence was obtained.
Table 1 is a list of the sequences according to their SEQ ID Nos:. For SEQ ID Nos: 1-1049 (homologous to GenBank sequences) the first column contains Incyte clone numbers. The second column contains a relevant GenBank identification number match, if any. The last column contains an annotation associated with the referenced GenBank identification number along with the genus species or source name. For SEQ ID Nos: 1050-1490 (exact matches to GenBank) the first column contains the GenBank identification number. The second column contains an annotation associated with the referenced GenBank identification number along with the genus species or source name.
Table 2 is a list of the cDNA libraries and a description of the preparation of the cDNA libraries.
Definitions
The term xe2x80x9cmicroarrayxe2x80x9d refers to an ordered arrangement of hybridizable array elements. The array elements are arranged so that there are preferably at least about 10 different array elements, more preferably at least 100 array elements, and most preferably at least 1,000 array elements, on a 1 cm2 substrate surface. The maximum number of array elements is unlimited, but is at least 100,000 array elements. Furthermore, the hybridization signal from each of the array elements is individually distinguishable. In a preferred embodiment, the array elements comprise polynucleotide probes.
A xe2x80x9cpolynucleotidexe2x80x9d refers to a chain of nucleotides. Preferably, the chain has from 100 to 10,000 nucleotides, more preferably from 150 to 3,500 nucleotides. The term xe2x80x9cprobexe2x80x9d refers to the ability of the polynucleotide to hybridize with a target polynucleotide to form a polynucleotide probe/target complex. A xe2x80x9ctarget polynucleotidexe2x80x9d refers to a chain of nucleotides to which a polynucleotide probe can hybridize by base pairing. In some instances, the sequences will be complementary (no mismatches). In other instances, there may be a 5% mismatch.
A xe2x80x9cpluralityxe2x80x9d refers preferably to a group of at least 10, more preferably to a group of at least 100, and even more preferably to a group of at least 1,000, members. The maximum number of members is unlimited, but is at least 100,000 members.
A xe2x80x9cportionxe2x80x9d means a stretch of at least 100 consecutive nucleotides. A xe2x80x9cportionxe2x80x9d can also mean a stretch of at least 100 consecutive nucleotides that contains one or more deletions, insertions or substitutions. A xe2x80x9cportionxe2x80x9d can also mean the whole coding sequence of a gene. Preferred portions are those that lack secondary structure as identified by using computer software programs such as OLIGO 4.06 Primer Analysis software (National Biosciences, Plymouth, Minn. LASERGENE (DNASTAR, Madison, Wis.) macDNAsis (Hitachi Software Engineering, South San Fransisco, Calif., and the like.
The term xe2x80x9cgenexe2x80x9d or xe2x80x9cgenesxe2x80x9d (Hitachi Software Engineering, South San Francisco, Calif. refers to the partial or complete coding sequence of a gene. The phrase xe2x80x9cgenes coding for signaling pathway polypeptidesxe2x80x9d refers to genes that code for polypeptides that likely participate in signaling pathways and include those listed in Table 1.
The phrase xe2x80x9cquery sequencesxe2x80x9d refers to sequences whose identity or homology is being investigated. A xe2x80x9cdatabasexe2x80x9d is a repository of information which is preferably accessible by electronic means. xe2x80x9cAnnotated sequencesxe2x80x9d are sequences whose identity has already been determined and preferably exist in a database. The phrase xe2x80x9cpercent sequence identityxe2x80x9d refers to the percentage of identical match found in a comparison of two or more amino acid or nucleic acid sequences.
The Invention
The present invention provides a composition comprising a plurality of polynucleotide probes, wherein each polynucleotide probe comprises at least a portion of a gene coding for a signaling pathway polypeptide (SPP). Preferably, the sequences of the polynucleotide probes are selected from those sequences presented in the Sequence Listing. In one preferred embodiment the composition comprises a plurality of polynucleotide probes wherein each gene coding for a signaling pathway polypeptide is at least a portion of a sequence selected from the group consisting of SEQ ID Nos: 1-1490. In a second preferred embodiment, the composition comprises a plurality of polynucleotide probes comprising at least a portion of at least 1000 of the sequences of SEQ ID Nos: 1-1490. In a third preferred embodiment, the composition comprises a plurality of polynucleotide probes wherein said polynucleotide probes comprise at least a portion of substantially all the sequences of SEQ ID Nos: 1-1490.
The composition is particularly useful when it is used as hybridizable array elements in a microarray. The microarray can be used for large scale genetic or gene expression analysis of a large number of target polynucleotides. The microarrays can be used in the diagnosis of diseases and in the monitoring of treatments where altered expression of SPPs cause disease, such as in cancer, an immunopathology, a neuropathology, and the like. The microarrays can also be used to investigate an individual""s predisposition to a disease, such as cancer, an immunopathology, a neuropathology, and the like.
When the composition of the invention is employed as hybridizable array elements in a microarray, the array elements are organized in an ordered fashion so that each element is present at a specified location on the substrate. Because the array elements are at specified locations on the substrate, the hybridization patterns and intensities (which together create a unique expression profile) can be interpreted in terms of expression levels of particular genes and can be correlated with a particular disease or condition or treatment.
The composition comprising a plurality of polynucleotide probes can also be used to purify a subpopulation of mRNAs, cDNAs, genomic fragments and the like, in a sample. Typically, samples will include the target polynucleotides of interest and other nucleic acids which may enhance the hybridization background in the sample. Therefore it may be advantageous to remove these nucleic acids. One method for removing the additional nucleic acids is by hybridizing the sample containing target polynucleotides with immobilized polynucleotide probes under hybridizing conditions. Those nucleic acids that do not hybridize to the polynucleotide probes are washed away. At a later point, the immobilized target polynucleotide probes can be released in the form of purified target polynucleotides.
Polynucleotide Probes
This section describes the selection of probe sequences for the plurality of polynucleotide probes. The probe sequences are derived from genes that code for signaling pathway polypeptides (SPPs) and can include gene sequences that fit in one of three different functional sequence groups (I through III). As a result, the composition of polynucleotide probes comprises sequences derived from genes of one of these functional sequence groups, the combination of any two of these functional sequence groups or from the combination of all three functional sequence groups. In a preferred embodiment, the composition comprises polynucleotide probes comprising sequences derived from all three functional sequence groups.
The functional sequence groups are divided as follows. Functional sequence group I comprises sequences for genes coding for receptor-like polypeptides. These polypeptides are able to sense the external environment of a cell and initiate a cascade of events. Included in this functional sequence group are binding proteins, receptor tyrosine kinases, G protein receptors, seven transmembrane domain receptors, tyrosine kinase receptors and the like. Functional sequence group II comprises sequences for genes coding for transducing polypeptides. These polypeptides transmit and amplify signals received from the receptor-like polypeptides. Included in this functional group are G proteins, growth and differentiation proteins, serine/threonine phosphatases, tyrosine phosphatases, phosphodiestereases, phospholipases, ras-related proteins, serine/threonine kinases, MAP kinases, adenylyl cyclases and the like. Functional sequence group III comprises sequences for genes coding for effector-like polypeptides. The effector-like polypeptides may perform a cellular function as a result of having sensed the signals from the transducing polypeptides. Included in this functional sequence group are cell matrix adhesion proteins, cell-cell adhesion proteins, ion channels, chemokines, cyclooxygenases, cytokines, hormones, nitric oxide synthases, proteases, protease inhibitors, transcription factors, transporter proteins and the like.
Genes for the functional sequence groups are selected by screening a large number of cDNA libraries, such as those described in Table 2, to discover clone inserts with sequences (listed in the Lifeseq databases) which are matches to genes coding for SPPs. The matches can be exact matches (100% identity) or homologous. As used herein, xe2x80x9chomologousxe2x80x9d refers to sequence similarity between a reference sequence and at least a portion of a newly sequenced clone insert, and can refer to either a nucleic acid or amino acid sequence. Preferably, regions of homology are identified using BLAST (Basic Local Alignment Search Tool). (See Altschul, S. F. (1993) J. Mol. Evol 36: 290-300; and Altschul et al. (1990) J. Mol. Biol. 215: 403-410). BLAST involves first finding similar segments between the query sequence and a database sequence, then evaluating the statistical significance of any matches that are found and finally reporting only those matches that satisfy a user-selectable threshold of significance. Alternatively, other search algorithms can be employed such as FASTA, a rapid sequencing algorithm described by Lippman and Pearson (1988; PNAS 85:2444-2448); ClustalW, a multiple sequence alignment program for DNA or proteins (Thompson et. al. (1994) Nucl. Acid Res. 22: 4673-4680); and the like.
In one preferred embodiment, full length gene coding sequences derived from the clone inserts are used as query sequences against sequences in public databases, such as the GenPept and GenBank databases (human, primate, and rodent databases). These databases contain previously identified and annotated sequences. In another embodiment, expression sequence tags (ESTs) are used as query sequences.
Top hit annotation is then performed. When an alignment between the query sequence and a sequence in any of the databases has a statistically significant score, the query sequence is annotated with the annotation of that sequence (resulting match). Sequences with the same annotation are placed in the same protein function tree, i.e., the tyrosine kinase tree, the serine/threonine kinase tree, the G protein tree and the like. A database employing protein functions to analyze sequence data is disclosed in copending patent application entitled xe2x80x9cDatabase System Employing Protein Function Hierarchies for Viewing Biomolecular Sequence Dataxe2x80x9d, Ser. No. 08/812,290, herein incorporated by reference. Several protein function trees are then combined to form functional sequence groups.
After identifying those sequences that have been annotated to the different protein function groups, polynucleotide probes are generated from these sequences. These sequences are provided in SEQ ID Nos: 1-1490 in the Sequence Listing. Table 1 provides the annotation for the referenced identification number for SEQ ID Nos: 1-1490.
The resulting composition can comprise polynucleotide probes that are not redundant, i.e., there is no more than one polynucleotide probe to represent a particular gene. Alternatively, the composition can comprise polynucleotide probes that are redundant, i.e., a gene is represented by more than one polynucleotide probe.
The selected polynucleotide probes may be manipulated further to optimize the performance of the polynucleotide probes as hybridization probes. Some probes may not hybridize effectively under hybridization conditions due to secondary structure. To optimize probe hybridization, the probe sequences are examined using a computer algorithm to identify portions of genes without potential secondary structure. Such computer algorithms are well known in the art, such as OLIGO 4.06 Primer Analysis software (National Biosciences), LASERGENE (DNASTAR) or MAcDNASIS (Hitachi). These programs can search nucleotide sequences to identify stem loop structures and tandem repeats and analyze the G+C content of the sequence (those sequences with a G+C content greater than 60% are excluded). Alternatively, the probes can be optimized by trial and error. Experiments can be performed to determine whether probes and target polynucleotides hybridize optimally under experimental conditions.
Where the number of different polynucleotide probes is desired to be greatest, the probe sequences are extended to assure that different polynucleotide probes are not derived from the same gene, i.e., the polynucleotide probes are not redundant. The probe sequences may be extended utilizing the partial nucleotide sequences derived from EST sequencing by employing various methods known in the art. For example, one method which may be employed, xe2x80x9crestriction-sitexe2x80x9d PCR, uses universal primers to retrieve unknown sequence adjacent to a known locus (Sarkar, G. (1993) PCR Methods Applic. 2: 318-322).
Polynucleotide Probes
This section describes the polynucleotide probes. The polynucleotide probes can be DNA or RNA, or any RNA-like or DNA-like material. The polynucleotide probes can be sense or antisense polynucleotide probes. Where target polynucleotides are double stranded, the probes may be either sense or antisense strands. Where the target polynucleotides are single stranded, the nucleotide probes are complementary single strands.
In one embodiment, the polynucleotide probes are complementary DNAs (cDNAs). The size of the DNA sequence of interest may vary, and is preferably from 100 to 10,000 nucleotides, more preferably from 150 to 3,500 nucleotides.
In a second embodiment, the polynucleotide probes are clone DNAs. In this case the size of the DNA sequence of interest, i.e., the insert sequence excluding the vector DNA, may vary from 100 to 10,000 nucleotides, more preferably from 150 to 3,500 nucleotides.
The polynucleotide probes can be prepared by a variety of synthetic or enzymatic schemes which are well known in the art. The probes can be synthesized, in whole or in part, using chemical methods well known in the art. (Caruthers et al. (1980) Nucleic. Acids Res. Symp. Ser. (2). Alternatively, the probes can be generated, in whole or in part, enzymatically.
Nucleotide analogues can be incorporated into the polynucleotide probes by methods well known in the art. The only requirement is that most of the incorporated nucleotide analogues must serve to base pair with target polynucleotide sequences. For example, certain guanine nucleotides can be substituted with hypoxanthine which base pairs with cytosine residues. However, these base pairs are less stable than those between guanine and cytosine. Alternatively, adenine nucleotides can be substituted with 2, 6-diaminopurine which can form stronger base pairs than those between adenine and thymidine.
Additionally, the polynucleotide probes can include nucleotides that have been derivatized chemically or enzymatically. Typical chemical modifications include derivatization with acyl, alkyl, aryl or amino groups.
The polynucleotide probes can be immobilized on a substrate. Preferred substrates are any suitable rigid or semirigid support including membranes, filters, chips, slides, wafers, fibers, magnetic or nonmagnetic beads, gels, tubing, plates, polymers, microparticles and capillaries. The substrate can have a variety of surface forms, such as wells, trenches, pins, channels and pores, to which the polynucleotide probes are bound. Preferably, the substrates are optically transparent.
Probes can be synthesized, in whole or in part, on the surface of a substrate by using a chemical coupling procedure and an ink jet application apparatus, such as that described in PCT publication WO95/251116 (Baldeschweiler et al.). Alternatively, the probe can be synthesized using a self-addressable electronic device that controls when reagents are added (Heller et al. U.S. Pat. No. 5,605,662) or by photolysis using imaging fibers for light delivery (Healey et al. (1995) Science 269: 1078-80).
Complementary DNA (cDNA) can be arranged and then immobilized on a substrate. The probes can be immobilized by covalent means such as by chemical bonding procedures or UV. In one such method, a cDNA is bound to a glass surface which has been modified to contain epoxide or aldehyde groups. In another case, a cDNA probe is placed on a polylysine coated surface and then UV cross-linked (Shalon et al. PCT publication WO95/35505, herein incorporated by reference). In yet another method, a DNA is actively transported from a solution to a given position on a substrate by electrical means (Heller et al. U.S. Pat. No. 5,605,662). Alternatively, individual DNA clones can be gridded on a filter. Cells are lysed, proteins and cellular components degraded and the DNA coupled to the filter by UV cross-linking.
Furthermore, the probes do not have to be directly bound to the substrate, but rather can be bound to the substrate through a linker group. The linker groups are typically about 6 to 50 atoms long to provide exposure to the attached polynucleotide probe. Preferred linker groups include ethylene glycol oligomers, diamines, diacids and the like. Reactive groups on the substrate surface react with one of the terminal portions of the linker to bind the linker to the substrate. The other terminal portion of the linker is then functionalized for binding the polynucleotide probe.
The polynucleotide probes can be attached to a substrate by dispensing reagents for probe synthesis on the substrate surface or by dispensing preformed DNA fragments or clones on the substrate surface. Typical dispensers include a micropipette delivering solution to the substrate with a robotic system to control the position of the micropipette with respect to the substrate. There can be a multiplicity of dispensers so that reagents can be delivered to the reaction regions simultaneously.
Sample Preparation
In order to conduct sample analysis, a sample containing target polynucleotides is provided. The samples can be any sample containing target polynucleotides and obtained from any bodily fluid (blood, urine, saliva, phlegm, gastric juices, etc.), cultured cells, biopsies, or other tissue preparations.
The target polynucleotides can be DNA or RNA. The DNA or RNA can be isolated from the sample according to any of a number of methods well known to those of skill in the art. For example, methods of purification of nucleic acids are described in Laboratory Techniques in Biochemistry and Molecular Biology: Hybridization With Nucleic Acid Probes. Part I. Theory and Nucleic Acid Preparation, P. Tijssen, ed. Elsevier Science, New York, N.Y. (1993). In a preferred embodiment, total RNA is isolated using the TRIZOL total RNA isolation reagent (Life Technologies Gaithersburg, Md.) and mRNA is isolated using oligo d(T) column chromatography or glass beads.
Alternatively, the target polynucleotides may be derived from DNA or RNA. When target polynucleotides are derived from an mRNA, the target polynucleotides can be a cDNA reverse transcribed from an mRNA, an RNA transcribed from that cDNA, a DNA amplified from that cDNA, an RNA transcribed from the amplified DNA, and the like. When the target polynucleotide is derived from DNA, the target polynucleotide can be DNA amplified from DNA or RNA reverse transcribed from DNA. In yet another alternative, the targets are target polynucleotides prepared by more than one method.
When target polynucleotides are amplified it is desirable to amplify the nucleic acid sample and maintain the relative abundances of the original sample, including low abundance transcripts. Total mRNA can be amplified by reverse transcription using a reverse transcriptase and a primer consisting of oligo d(T) and a sequence encoding the phage T7 promoter to provide a single stranded DNA template. The second cDNA strand is polymerized using a DNA polymerase and a RNAse which assists in breaking up the DNA/RNA hybrid. After synthesis of the double stranded cDNA, T7 RNA polymerase can be added and RNA transcribed from the second cDNA strand template (Van Gelder et al. U.S. Pat. No. 5,545,522). RNA can be amplified in vitro, in situ or in vivo (See Eberwine U.S. Pat. No. 5,514,545).
It is also advantageous to include quantitation controls within the sample to assure that amplification and labeling procedures do not change the true distribution of target polynucleotides in a sample. For this purpose, a sample is spiked with a known amount of a control target polynucleotide and the composition of polynucleotide probes includes reference polynucleotide probes which specifically hybridize with the control target polynucleotides. After hybridization and processing, the hybridization signals obtained should reflect accurately the amounts of control target polynucleotide added to the sample.
Prior to hybridization, it may be desirable to fragment the nucleic acid target polynucleotides. Fragmentation improves hybridization by minimizing secondary structure and cross-hybridization to other nucleic acid target polynucleotides in the sample or noncomplementary polynucleotide probes. Fragmentation can be performed by mechanical or chemical means.
The target polynucleotides may be labeled with one or more labeling moieties to allow for detection of hybridized probe/target polynucleotide complexes. The labeling moieties can include compositions that can be detected by spectroscopic, photochemical, biochemical, bioelectronic, immunochemical, electrical, optical or chemical means. The labeling moieties include radioisotopes, such as 32P, 33P or 35S, chemiluminescent compounds, labeled binding proteins, heavy metal atoms, spectroscopic markers, such as fluorescent markers and dyes, magnetic labels, linked enzymes, mass spectrometry tags, spin labels, electron transfer donors and acceptors, and the like.
Exemplary dyes include quinoline dyes, triarylmethane dyes, phthaleins, azo dyes, cyanine dyes and the like. Preferably, fluorescent markers absorb light above about 300 nm, preferably above 400 nm, and usually emit light at wavelengths at least greater than 10 nm above the wavelength of the light absorbed. Specific preferred fluorescent markers include fluorescein, phycoerythrin, rhodamine, lissamine, and C3 and C5 available from Amersham.
Labeling can be carried out during an amplification reaction, such as polymerase chain and in vitro transcription reactions, or by nick translation or 5xe2x80x2 or 3xe2x80x2-end-labeling reactions. In one case, labeled nucleotides are used in an in vitro transcription reaction. When the label is incorporated after or without an amplification step, the label is incorporated by using terminal transferase or by kinasing the 5xe2x80x2 end of the target polynucleotide and then incubating overnight with a labeled oligonucleotide in the presence of T4 RNA ligase.
Alternatively, the labeling moiety can be incorporated after hybridization once a probe/target complex has formed. In one case, biotin is first incorporated during an amplification step as described above. After the hybridization reaction, unbound nucleic acids are rinsed away so that the only biotin remaining bound to the substrate is that attached to target polynucleotides that are hybridized to the polynucleotide probes. Then, an avidin-conjugated fluorophore, such as avidin-phycoerythrin, that binds with high affinity to biotin is added. In another case, the labeling moiety is incorporated by intercalation into preformed target/polynucleotide probe complexes. In this case, an intercalating dye such as a psoralen-linked dye can be employed.
Under some circumstances it may be advantageous to immobilize the target polynucleotides on a substrate and have the polynucleotide probes bind to the immobilized target polynucleotides. In such cases the target polynucleotides can be attached to a substrate as described above.
Hybridization and Detection
Hybridization causes a denatured polynucleotide probe and a denatured complementary target to form a stable duplex through base pairing. Hybridization methods are well known to those skilled in the art (See, for example, Laboratory Techniques in Biochemistry and Molecular Biology, Vol. 24: Hybridization With Nucleic Acid Probes, P. Tijssen, ed. Elsevier Science, New York, N.Y. (1993)). Conditions can be selected for hybridization where exactly complementary target and polynucleotide probe can hybridize, i.e., each base pair must interact with its complementary base pair. Alternatively, conditions can be selected where target and polynucleotide probes have mismatches but are still able to hybridize. Suitable conditions can be selected, for example, by varying the concentrations of salt or formamide in the prehybridization, hybridization and wash solutions, or by varying the hybridization and wash temperatures.
Hybridization can be performed at low stringency with buffers, such as 6xc3x97SSPE with 0.005% Triton X-100 at 37xc2x0 C., which permits hybridization between target and polynucleotide probes that contain some mismatches to form target polynucleotide/probe complexes. Subsequent washes are performed at higher stringency with buffers, such as 0.5xc3x97SSPE with 0.005% Triton X-100 at 50xc2x0 C., to retain hybridization of only those target/probe complexes that contain exactly complementary sequences. Alternatively, hybridization can be performed with buffers, such as 5xc3x97SSC/0.2% SDS at 60xc2x0 C. and washes are performed in 2xc3x97SSC/0.2% SDS and then in 0.1xc3x97SSC. Stringency can also be increased by adding agents such as formamide. Background signals can be reduced by the use of detergent, such as sodium dodecyl sulfate, Sarcosyl or Triton X-100, or a blocking agent, such as sperm DNA.
Hybridization specificity can be evaluated by comparing the hybridization of specificity-control polynucleotide probes to specificity-control target polynucleotides that are added to a sample in a known amount. The specificity-control target polynucleotides may have one or more sequence mismatches compared with the corresponding polynucleotide probes. In this manner, whether only complementary target polynucleotides are hybridizing to the polynucleotide probes or whether mismatched hybrid duplexes are forming is determined.
Hybridization reactions can be performed in absolute or differential hybridization formats. In the absolute hybridization format, target polynucleotides from one sample are hybridized to the probes in a microarray format and signals detected after hybridization complex formation correlate to target polynucleotide levels in a sample. In the differential hybridization format, the differential expression of a set of genes in two biological samples is analyzed. For differential hybridization, target polynucleotides from both biological samples are prepared and labeled with different labeling moieties. A mixture of the two labeled target polynucleotides is added to a microarray. The microarray is then examined under conditions in which the emissions from the two different labels are individually detectable. Probes in the microarray that are hybridized to substantially equal numbers of target polynucleotides derived from both biological samples give a distinct combined fluorescence (Shalon et al. PCT publication WO95/35505). In a preferred embodiment, the labels are fluorescent labels with distinguishable emission spectra, such as a lissamine conjugated nucleotide analog and a fluorescein conjugated nucleotide analog. In another embodiment Cy3/Cy5 fluorophores (Amersham) Pharmacia Biotech, Piscataway, N.J. are employed.
After hybridization, the microarray is washed to remove nonhybridized nucleic acids and complex formation between the hybridizable array elements and the target polynucleotides is detected.
Methods for detecting complex formation are well known to those skilled in the art. In a preferred embodiment, the target polynucleotides are labeled with a fluorescent label and measurement of levels and patterns of fluorescence indicative of complex formation is accomplished by fluorescence microscopy, preferably confocal fluorescence microscopy. An argon ion laser excites the fluorescent label, emissions are directed to a photomultiplier and the amount of emitted light detected and quantitated. The detected signal should be proportional to the amount of probe/target polynucleotide complex at each position of the microarray. The fluorescence microscope can be associated with a computer-driven scanner device to generate a quantitative two-dimensional image of hybridization intensity. The scanned image is examined to determine the abundance/expression level of each hybridized target polynucleotide.
In a differential hybridization experiment, target polynucleotides from two or more different biological samples are labeled with two or more different fluorescent labels with different emission wavelengths. Fluorescent signals are detected separately with different photomultipliers set to detect specific wavelengths. The relative abundances/expression levels of the target polynucleotides in two or more samples is obtained.
Typically, microarray fluorescence intensities can be normalized to take into account variations in hybridization intensities when more than one microarray is used under similar test conditions. In a preferred embodiment, individual polynucleotide probe/target complex hybridization intensities are normalized using the intensities derived from internal normalization controls contained on each microarray.
Expression Profiles
This section describes an expression profile using the composition of this invention. The expression profile can be used to detect changes in the expression of genes coding for SPPs. These genes include genes whose altered expression is correlated with cancer, immunopathology, neuropathology and the like.
The expression profile comprises the polynucleotide probes of the invention. The expression profile also includes a plurality of detectable complexes. Each complex is formed by hybridization of one or more polynucleotide probes to one or more target polynucleotides. At least one of the polynucleotide probes, preferably a plurality of polynucleotide probes, is hybridized to a target polynucleotide forming, at least one, preferably a plurality of complexes. A complex is detected by incorporating at least one labeling moiety in the complex. The labeling moiety has been described above.
The expression profiles provide xe2x80x9csnapshotsxe2x80x9d that can show unique expression patterns that are characteristic of a disease or condition.
Utility of the Invention
The composition comprising a plurality of polynucleotide probes can be used as hybridizable array elements in a microarray. Such a microarray can be employed in several applications including diagnostics and treatment regimens, drug discovery and development, toxicological and carcinogenicity studies, forensics, pharmacogenomics and the like.
In one situation, the microarray is used to monitor the progression of disease. Researchers can assess and catalog the differences in gene expression between healthy and diseased tissues or cells. By analyzing changes in patterns of gene expression, disease can be diagnosed at earlier stages before the patient is symptomatic.
Similarly, the invention can be used to monitor the progression of disease or the efficacy of treatment. For some treatments with known side effects, the microarray is employed to fine-tune the treatment regimen. A dosage will be established that causes a change in genetic expression patterns indicative of successful treatment. Expression patterns associated with undesirable side effects are avoided. This approach may be more sensitive and rapid than waiting for the patient to show inadequate improvement, or manifest symptoms, before altering the course of treatment.
Alternatively, animal models which mimic a disease rather than patients can be used to characterize expression profiles associated with a particular disease or condition. For example, a characteristic gene expression pattern for the graft versus host reaction can be generated using analogous reactions that occur when lymphocytes from one donor are mixed with lymphocytes from another donor. This gene expression data may be useful in diagnosing and monitoring the course of graft versus host reaction in a patient, in determining gene targets for intervention, and in testing novel immunosuppressants.
The microarray is particularly useful for diagnosing and monitoring the progression of diseases that may be associated with the altered expression of SPPs. The expression of SPPs is closely associated with cell proliferation. Thus, the microarray and expression profiles are particularly useful to diagnose a cancer such as adenocarcinoma, leukemia, lymphoma, melanoma, myeloma, sarcoma and teratocarcinoma. Such cancers include, but are not limited to, cancers of the adrenal gland, bladder, bone, bone marrow, brain, breast, cervix, gall bladder, ganglia, gastrointestinal tract, heart, kidney, liver, lung, muscle, ovary, pancreas, parathyroid, penis, prostate, salivary glands, skin, spleen, testis, thymus, thyroid and uterus.
The expression of SPPs is also closely associated with an immune response. Therefore, the microarray can be used to diagnose immunopathologies including but not limited to AIDS, Addison""s disease, adult respiratory distress syndrome, allergies, anemia, asthma, atherosclerosis, bronchitis, cholecystitus, Crohn""s disease, ulcerative colitis, atopic dermatitis, dermatomyositis, diabetes mellitus, emphysema, atrophic gastritis, glomerulonephritis, gout, Graves"" disease, hypereosinophilia, irritable bowel syndrome, lupus erythematosus, multiple sclerosis, myasthenia gravis, myocardial or pericardial inflammation, osteoarthritis, osteoporosis, pancreatitis, polymyositis, rheumatoid arthritis, scleroderma, Sjxc3x6gren""s syndrome, and autoimmune thyroiditis; viral, bacterial, fungal, parasitic, and protozoal infections and trauma.
Neuronal processes are also affected by the expression of SPPs. Thus, the microarray can be used to diagnose neuropathologies including but not limited to akathesia, Alzheimer""s disease, amnesia, amyotrophic lateral sclerosis, bipolar disorder, catatonia, cerebral neoplasms, dementia, depression, Down""s syndrome, tardive dyskinesia, dystonias, epilepsy, Huntington""s disease, multiple sclerosis, neurofibromatosis, Parkinson""s disease, paranoid psychoses, schizophrenia, and Tourette""s disorder.
The invention also allows researchers to develop sophisticated profiles of the effects of currently available therapeutic drugs. Tissues or cells treated with these drugs can be analyzed using the invention, and compared to untreated samples of the same tissues or cells. In this way, an expression profile of known therapeutic agents will be developed. Knowing the identity of sequences that are differentially regulated in the presence and absence of a drug will allow researchers to elucidate the molecular mechanisms of action of that drug.
Also, researchers can use the invention to rapidly screen large numbers of candidate drugs, looking for ones that have an expression profile similar to those of known therapeutic drugs, with the expectation that molecules with the same expression profile will likely have similar therapeutic effects. Thus, the invention provides the means to determine the molecular mode of action of a drug.
It is understood that this invention is not limited to the particular methodology, protocols, and reagents described, as these may vary. It is also understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to limit the scope of the present invention which will be limited only by the appended claims. The examples below are provided to illustrate the subject invention and are not included for the purpose of limiting the invention.