A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the xerographic reproduction by anyone of the patent document or the patent disclosure in exactly the form it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
Many disease states are characterized by differences in the expression levels of various genes either through changes in the copy number of the genetic DNA or through changes in levels of transcription (e.g. through control of initiation, provision of RNA precursors, RNA processing, etc.) of particular genes. For example, losses and gains of genetic material play an important role in malignant transformation and progression. These gains and losses are thought to be xe2x80x9cdrivenxe2x80x9d by at least two kinds of genes. Oncogenes are positive regulators of tumorigenesis, while tumor suppressor genes are negative regulators of tumorigenesis (Marshall, Cell, 64: 313-326 (1991); Weinberg, Science, 254: 1138-1146 (1991)). Therefore, one mechanism of activating unregulated growth is to increase the number of genes coding for oncogene proteins or to increase the level of expression of these oncogenes (e.g. in response to cellular or environmental changes), and another is to lose genetic material or to decrease the level of expression of genes that code for tumor suppressors. This model is supported by the losses and gains of genetic material associated with glioma progression (Mikkelson et al. J. Cell. Biochem. 46: 3-8 (1991)). Thus, changes in the expression (transcription) levels of particular genes (e.g. oncogenes or tumor suppressors), serve as signposts for the presence and progression of various cancers.
Similarly, control of the cell cycle and cell development, as well as diseases are characterized by the variations in the transcription levels of particular genes. Thus, for example, a viral infection is often characterized by the elevated expression of genes of the particular virus. For example, outbreaks of Herpes simplex, Epstein-Barr virus infections (e.g. infectious mononucleosis), cytomegalovirus, Varicella-zoster virus infections, parvovirus infections, human papillomavirus infections, etc. are all characterized by elevated expression of various genes present in the respective virus. Detection of elevated expression levels of characteristic viral genes provides an effective diagnostic of the disease state. In particular, viruses such as herpes simplex, enter quiescent states for periods of time only to erupt in brief periods of rapid replication. Detection of expression levels of characteristic viral genes allows detection of such active proliferative (and presumably infective) states.
The use of xe2x80x9ctraditionalxe2x80x9d hybridization protocols for monitoring or quantifying gene expression is problematic. For example two or more gene products of approximately the same molecular weight will prove difficult or impossible to distinguish in a Northern blot because they are not readily separated by electrophoretic methods. Similarly, as hybridization efficiency and cross-reactivity varies with the particular subsequence (region) of a gene being probed it is difficult to obtain an accurate and reliable measure of gene expression with one, or even a few, probes to the target gene.
The development of VLSIPS(trademark) technology provided methods for synthesizing arrays of many different oligonucleotide probes that occupy a very small surface area. See U.S. Pat. No. 5,143,854 and PCT patent publication No. WO 90/15070. U.S. patent application Ser. No. 082,937, filed Jun. 25, 1993, describes methods for making arrays of oligonucleotide probes that can be used to provide the complete sequence of a target nucleic acid and to detect the presence of a nucleic acid containing a specific nucleotide sequence.
Previous methods of measuring nucleic acid abundance differences or changes in the expression of various genes (e.g., differential diaplay, SAGE, cDNA sequencing, clone spotting, etc.) require assumptions about, or prior knowledge regarding the target sequences in order to design appropriate sequence-specific probes. Other methods, such as subtractive hybridization, do not require prior sequence knowledge, but also do not directly provide sequence information regarding differentially expressed nucleic acids.
The present invention, in one embodiment, provides methods of monitoring the expression of a multiplicity of preselected genes (referred to herein as xe2x80x9cexpression monitoringxe2x80x9d). In another embodiment this invention provides a way of identifying differences in the compositions of two or more nucleic acid (e.g., RNA or DNA) samples. Where the nucleic acid abundances reflect expression levels in biological samples from which the samples are derived, the invention provides a method for identifying differences in expression profiles between two or more samples. These xe2x80x9cgeneric difference screening methodsxe2x80x9d are rapid, simple to apply, require no a priori assumptions regarding the particular sequences whose expression may differ between the two samples, and provide direct sequence information regarding the nucleic acids whose abundances differ between the samples.
In one embodiment, this invention provides a method of identifying differences in nucleic acid levels between two or more nucleic acid samples. The method involves the steps of: (a) providing one or more oligonucleotide arrays said arrays comprising probe oligonucleotides attached to a surface; (b) hybridizing said nucleic acid samples to said one or more arrays to form hybrid duplexes between nucleic acids in said nucleic acid samples and probe oligonucleotides in said one or more arrays that are complementary to said nucleic acids or subsequences thereof; (c) contacting said one or more arrays with a nucleic acid ligase; and (d) determining differences in hybridization between said nucleic acid samples wherein said differences in hybridization indicate differences in said nucleic acid levels.
In another embodiment, the method of identifying differences in nucleic acid levels between two or more nucleic acid samples involves the steps of: (a) providing one or more oligonucleotide arrays comprising probe oligonucleotides wherein said probe oligonucleotides comprise a constant region and a variable region; (b) hybridizing said nucleic acid samples to said one or more arrays to form hybrid duplexes between nucleic acids in said nucleic acid samples and said variable regions that are complementary to said nucleic acids or subsequences thereof; and (c) determining differences in hybridization between said nucleic acid samples wherein said differences in hybridization indicate differences in said nucleic acid levels.
In yet another embodiment, the method of identifying differences in nucleic acid levels between two or more nucleic acid samples involves the steps of: (a) providing one or more high density oligonucleotide arrays; (b) hybridizing said nucleic acid samples to said one or more arrays to form hybrid duplexes between nucleic acids in said nucleic acid samples and probe oligonucleotides in said one or more arrays that are complementary to said nucleic acids or subsequences thereof; and (c) determining the differences in hybridization between said nucleic acid samples wherein said differences in hybridization indicate differences in said nucleic acid levels.
In still yet another embodiment, the method of identifying differences in nucleic acid levels between two or more nucleic acid samples involves the steps of: (a) providing one or more oligonucleotide arrays each comprising probe oligonucleotides wherein said probe oligonucleotides are not chosen to hybridize to nucleic acids derived from particular preselected genes or mRNAs; (b) hybridizing said nucleic acid samples to said one or more arrays to form hybrid duplexes between nucleic acids in said nucleic acid samples and probe oligonucleotides in said one or more arrays that are complementary to said nucleic acids or subsequences thereof; and (d) determining differences in hybridization between said nucleic acid samples wherein said differences in hybridization indicate differences in said nucleic acid levels.
In another embodiment, the methods of identifying differences in nucleic acid levels between two or more nucleic acid samples involves the steps of: (a) providing one or more oligonucleotide arrays each comprising probe oligonucleotides wherein said probe oligonucleotides comprise a nucleotide sequences or subsequences selected according to a process selected from the group consisting of a random selection, a haphazard selection, a nucleotide composition biased selection, and all possible oligonucleotides of a preselected length; (b) hybridizing said nucleic acid samples to said one or more arrays to form hybrid duplexes between nucleic acids in said nucleic acid samples and probe oligonucleotides in said one or more arrays that are complementary to said nucleic acids or subsequences thereof; and (c) determining differences in hybridization between said nucleic acid samples wherein said differences in hybridization indicate differences in said nucleic acid levels.
In another embodiment, the methods of identifying differences in nucleic acid levels between two or more nucleic acid samples involve the steps of: (a) providing one or more oligonucleotide arrays each comprising probe oligonucleotides wherein said probe oligonucleotides comprise a nucleotide sequence or subsequences selected according to a process selected from the group consisting of a random selection, a haphazard selection, a nucleotide composition biased selection, and all possible oligonucleotides of a preselected length; (b) providing software describing the location and sequence of probe oligonucleotides on said array; (c) hybridizing said nucleic acid samples to said one or more arrays to form hybrid duplexes between nucleic acids in said nucleic acid samples and probe oligonucleotides in said one or more arrays that are complementary to said nucleic acids or subsequences thereof; and (d) operating said software such that said hybridizing indicates differences in said nucleic acid levels.
This invention also provides methods of simultaneously monitoring the expression of a multiplicity of genes. In one embodiment these methods involve (a) providing a pool of target nucleic acids comprising RNA transcripts of one or more of said genes, or nucleic acids derived from said RNA transcripts; (b) hybridizing said pool of nucleic acids to an oligonucleotide array comprising probe oligonucleotides immobilized on a surface; (c) contacting said oligonucleotide array with a ligase; and (d) quantifying the hybridization of said nucleic acids to said array wherein said quantifying provides a measure of the levels of transcription of said genes.
Still yet another method of identifying differences in nucleic acid levels between two or more nucleic acid samples involves the steps of: (a) providing one or more arrays of oligonucleotides each array comprising pairs of probe oligonucleotides where the members of each pair of probe oligonucleotides differ from each other in preselected nucleotides; (b) hybridizing said nucleic acid samples to said one or more arrays to form hybrid duplexes between nucleic acids in said nucleic acid samples and probe oligonucleotides in said one or more arrays that are complementary to said nucleic acids or subsequences thereof; (c) determining the differences in hybridization between said nucleic acid samples wherein said differences in hybridization indicate differences in said nucleic acid levels.
Another method of simultaneously monitoring the expression of a multiplicity of genes, involves the steps of: (a) providing one or more oligonucleotide arrays comprising probe oligonucleotides wherein said probe oligonucleotides comprise a constant region and a variable region; (b) providing a pool of target nucleic acids comprising RNA transcripts of one or more of said genes, or nucleic acids derived from said RNA transcripts; (c) hybridizing said pool of nucleic acids to an array of oligonucleotide probes immobilized on a surface; and (d) quantifying the hybridization of said nucleic acids to said array wherein said quantifying provides a measure of the levels of transcription of said genes.
This invention additionally provides methods of making a nucleic acid array for identifying differences in nucleic acid levels between two or more nucleic acid samples. In one embodiment the method involves the steps of: (a) providing an oligonucleotide array comprising probe oligonucleotides wherein said probe oligonucleotides comprise a constant region and a variable region; (b) hybridizing one or more of said nucleic acid samples to said arrays to form hybrid duplexes of said variable region and nucleic acids in said nucleic acid samples comprising subsequences complementary to said variable region; (c) attaching the sample nucleic acids comprising said hybrid duplexes to said array of probe oligonucleotides; and (d) removing unattached nucleic acids to provide a high density oligonucleotide array bearing sample nucleic acids attached to said array.
In another embodiment the method of making a nucleic acid array for identifying differences in nucleic acid levels between two or more nucleic acid samples, involves the steps of: (a) providing a high density array; (b) contacting said array one or more of said two or more nucleic acid samples whereby nucleic acids of said one of said two or more nucleic acid samples form hybrid duplexes with probe oligonucleotides in said arrays; (c) attaching the sample nucleic acids comprising said hybrid duplexes to said array of probe oligonucleotides; and (d) removing unattached nucleic acids to provide a high density oligonucleotide array bearing sample nucleic acids attached to said array.
This invention additionally provides kits for practice of the methods described herein. One kit comprises a container containing one or more oligonucleotide arrays said arrays comprising probe oligonucleotides attached to a surface; and a container containing a ligase. Another kit comprises a container containing one or more oligonucleotide arrays said arrays comprising probe oligonucleotides wherein said probe oligonucleotides comprise a constant region and a variable region. This kit optionally includes a constant oligonucletide complementary to said constant region or a subsequence thereof.
Preferred high density oligonucleotide arrays of this invention comprise more than 100 different probe oligonucleotides wherein: each different probe oligonucleotide is localized in a predetermined region of the array; each different probe oligonucleotide is attached to a surface through a terminal covalent bond; and the density of said probe different oligonucleotides is greater than about 60 different oligonucleotides per 1 cm2. The high density arrays can be used in all of the array-based methods discussed herein. High density arrays used for expression monitoring will typically include oligonucleotide probes selected to be complementary to a nucleic acid derived from one or more preselected genes. In contrast, generic difference screening arrays may contain probe oligonucleotides selected randomly, haphazardly, arbitrarily, or including sequences or subsequences comprising all possible nucleic acid sequences of a particular (preselected) length.
In a preferred embodiment, pools of oligonucleotides or oligonucleotide subsequences comprising all possible nucleic acids of a particular length are selected from the group consisting of all possible 6 mers, all possible 7 mers, all possible 8 mers, all possible 9 mers, all possible 10 mers, all possible 11 mers, and all possible 12 mers.
This invention also provides methods of labeling a nucleic acid. In one embodiment, this method involves the steps of: (a) providing a nucleic acid; (b) amplifying said nucleic acid to form amplicons; (c) fragmenting said amplicons to form fragments of said amplicons; and (d) coupling a labeled moiety to at least one of said fragments.
In another embodiment, the methods involve the steps of: (a) providing a nucleic acid; (b) transcribing said nucleic acid to formed a transcribed nucleic acid; (c) fragmenting said transcribed nucleic acid to form fragments of said transcribed nucleic acid; and (d) coupling a labeled moiety to at least one of said fragments.
In yet another embodiment the methods involve the steps of: (a) providing at least one nucleic acid coupled to a support; (b) providing a labeled moiety capable of being coupled with a terminal transferase to said nucleic acid; (c) providing said terminal transferase; and (d) coupling said labeled moiety to said nucleic acid using said terminal transferase.
In still another embodiment, the methods involve the steps of: (a) providing at least two nucleic acids coupled to a support; (b) increasing the number of monomer units of said nucleic acids to form a common nucleic acid tail on said at least two nucleic acids; (c) providing a labeled moiety capable of recognizing said common nucleic acid tails; and (d) contacting said common nucleic acid tails and said labeled moiety.
In still yet another embodiment, the methods involve the steps of: (a) providing at least one nucleic acid coupled to a support; (b) providing a labeled moiety capable of being coupled with a ligase to said nucleic acid; (c) providing said ligase; and (d) coupling said labeled moiety to said nucleic acid using said ligase.
This invention also provides compounds of the formulas described herein.
Definitions
An array of oligonucleotides as used herein refers to a multiplicity of different (sequence) oligonucleotides attached (preferably through a single terminal covalent bond) to one or more solid supports where, when there is a multiplicity of supports, each support bears a multiplicity of oligonucleotides. The term xe2x80x9carrayxe2x80x9d can refer to the entire collection of oligonucleotides on the support(s) or to a subset thereof. The term xe2x80x9csame arrayxe2x80x9d when used to refer to two or more arrays is used to mean arrays that have substantially the same oligonucleotide species thereon in substantially the same abundances. The spatial distribution of the oligonucleotide species may differ between the two arrays, but, in a preferred embodiment, it is substantially the same. It is recognized that even where two arrays are designed and synthesized to be identical there are variations in the abundance, composition, and distribution of oligonucleotide probes. These variations are preferably insubstantial and/or compensated for by the use of controls as described herein.
The phrase xe2x80x9cmassively parallel screeningxe2x80x9d refers to the simultaneous screening of at least about 100, preferably about 1000, more preferably about 10,000 and most preferably about 1,000,000 different nucleic acid hybridizations.
The terms xe2x80x9cnucleic acidxe2x80x9d or xe2x80x9cnucleic acid moleculexe2x80x9d refer to a deoxyribonucleotide or ribonucleotide polymer in either single- or double-stranded form, and unless otherwise limited, would encompass known analogs of natural nucleotides that can function in a similar manner as naturally occurring nucleotides.
An oligonucleotide is a single-stranded nucleic acid ranging in length from 2 to about 1000 nucleotides, more typically from 2 to about 500 nucleotides in length.
As used herein a xe2x80x9cprobexe2x80x9d is defined as an oligonucleotide capable of binding to a target nucleic acid of complementary sequence through one or more types of chemical bonds, usually through complementary base pairing, usually through hydrogen bond formation. As used herein, an oligonucleotide probe may include natural (i.e. A, G, C, or T) or modified bases (7-deazaguanosine, inosine, etc.). In addition, the bases in oligonucleotide probe may be joined by a linkage other than a phosphodiester bond, so long as it does not interfere with hybridization. Thus, oligonucleotide probes may be peptide nucleic acids in which the constituent bases are joined by peptide bonds rather than phosphodiester linkages.
The term xe2x80x9ctarget nucleic acidxe2x80x9d refers to a nucleic acid (often derived from a biological sample and hence referred to also as a sample nucleic acid), to which the oligonucleotide probe specifically hybridizes. It is recognized that the target nucleic acids can be derived from essentially any source of nucleic acids (e.g., including, but not limited to chemical syntheses, amplification reactions, forensic samples, etc.). It is either the presence or absence of one or more target nucleic acids that is to be detected, or the amount of one or more target nucleic acids that is to be quantified. The target nucleic acid(s) that are detected preferentially have nucleotide sequences that are complementary to the nucleic acid sequences of the corresponding probe(s) to which they specifically bind (hybridize). The term target nucleic acid may refer to the specific subsequence of a larger nucleic acid to which the probe specifically hybridizes, or to the overall sequence (e.g., gene or mRNA) whose abundance (concentration) and/or expression level it is desired to detect. The difference in usage will be apparent from context.
A xe2x80x9cligatable oligonucleotidexe2x80x9d or xe2x80x9cligatable probexe2x80x9d or xe2x80x9cligatable oligonucleotide probexe2x80x9d refers to an oligonucleotide that is capable of being ligated to another oligonucleotide by the use of a ligase (e.g., T4 DNA ligase). The ligatable oligonucleotide is preferably a deoxyribonucleotide. The nucleotides comprising the ligatable oligonucleotide are preferably the xe2x80x9cstandardxe2x80x9d nucleotides; A, G, C, and T or U. However derivatized, modified, or alternative nucleotides (e.g., inosine) can be present as long as their presence does not interfere with the ligation reaction. The ligatable probe may be labeled or otherwise modified as long as the label does not interfere with the ligation reaction. Similarly the internucleotide linkages can be modified as long as the modification does not interfere with ligation. Thus, in some instances, the ligatable oligonucleotide can be a peptide nucleic acid.
xe2x80x9cSubsequencexe2x80x9d refers to a sequence of nucleic acids that comprises a part of a longer sequence of nucleic acids.
A xe2x80x9cwobblexe2x80x9d refers to a degeneracy at a particular position in an oligonucleotide. A fully degenerate or xe2x80x9c4 wayxe2x80x9d wobble refers to a collection of nucleic acids (e.g., oligonucleotide probes having A, G, C, or T for DNA or A, G, C, or U for RNA at the wobble position.). A wobble may be approximated by the replacement of the nucleotide with inosine which will base pair with A, G, C, or T or U. Typically oligonucleotides containing a fully degenerate wobble produced during chemical synthesis of an oligonucleotide is prepared by using a mixture of four different nucleotide monomers at the particular coupling step in which the wobble is to be introduced.
The term xe2x80x9ccross-linkingxe2x80x9d when used in reference to cross-linking nucleic acids refers to attaching nucleic acids such that they are not separated under typical conditions that are used to denature complementary nucleic acid sequences. Crosslinking preferably involves the formation of covalent linkages between the nucleic acids. Methods of cross-linking nucleic acids are described herein.
The phrase xe2x80x9ccoupled to a supportxe2x80x9d means bound directly or indirectly thereto including attachment by covalent binding, hydrogen bonding, ionic interaction, hydrophobic interaction, or otherwise.
xe2x80x9cAmpliconsxe2x80x9d are the products of the amplification of nucleic acids by PCR or otherwise.
xe2x80x9cTranscribing a nucleic acidxe2x80x9d means the formation of a ribonucleic acid from a deoxyribonucleic acid and the converse (the formation of a deoxyribonucleic acid from a ribonucleic acid). A nucleic acid can be transcribed by DNA-dependent RNA polymerase, reverse transcriptase, or otherwise.
A labeled moiety means a moiety capable of being detected by the various methods discussed herein or known in the art.
The term xe2x80x9ccomplexityxe2x80x9d is used here according to standard meaning of this term as established by Britten et al. Methods of Enzymol. 29:363 (1974). See, also Cantor and Schimmel Biophysical Chemistry: Part III at 1228-1230 for further explanation of nucleic acid complexity.
xe2x80x9cBind(s) substantiallyxe2x80x9d refers to complementary hybridization between a probe nucleic acid and a target nucleic acid and embraces minor mismatches that can be accommodated by reducing the stringency of the hybridization media to achieve the desired detection of the target polynucleotide sequence.
The phrase xe2x80x9chybridizing specifically toxe2x80x9d, refers to the binding, duplexing, or hybridizing of a molecule preferentially to a particular nucleotide sequence under stringent conditions when that sequence is present in a complex mixture (e.g., total cellular) DNA or RNA. The term xe2x80x9cstringent conditionsxe2x80x9d refers to conditions under which a probe will hybridize preferrentially to its target subsequence, and to a lesser extent to, or not at all to, other sequences. Stringent conditions are sequence-dependent and will be different in different circumstances. Longer sequences hybridize specifically at higher temperatures. Generally, stringent conditions are selected to be about 5xc2x0 C. lower than the thermal melting point (Tm) for the specific sequence at a defined ionic strength and pH. The Tm is the temperature (under defined ionic strength, pH, and nucleic acid concentration) at which 50% of the probes complementary to the target sequence hybridize to the target sequence at equilibrium. (As the target sequences are generally present in excess, at Tm, 50% of the probes are occupied at equilibrium). Typically, stringent conditions will be those in which the salt concentration is at least about 0.01 to 1.0 M Na ion concentration (or other salts) at pH 7.0 to 8.3 and the temperature is at least about 30xc2x0 C. for short probes (e.g., 10 to 50 nucleotides). Stringent conditions may also be achieved with the addition of destabilizing agents such as formamide.
The term xe2x80x9cperfect match probexe2x80x9d refers to a probe that has a sequence that is perfectly complementary to a particular target sequence. The test probe is typically perfectly complementary to a portion (subsequence) of the target sequence. The perfect match (PM) probe can be a xe2x80x9ctest probexe2x80x9d, a xe2x80x9cnormalization controlxe2x80x9d probe, an expression level control probe and the like. A perfect match control or perfect match probe is, however, distinguished from a xe2x80x9cmismatch controlxe2x80x9d or xe2x80x9cmismatch probe.xe2x80x9d In the case of expression monitoring arrays, perfect match probes are typically preselected (designed) to be complementary to particular sequences or subsequences of target nucleic acids (e.g., particular genes). In contrast, in generic difference screening arrays, the particular target sequences are typically unknown. In the latter case, prefect match probes cannot be preselected. The term perfect match probe in this context is to distinguish that probe from a corresponding xe2x80x9cmismatch controlxe2x80x9d that differs from the perfect match in one or more particular preselected nucleotides as described below.
The term xe2x80x9cmismatch controlxe2x80x9d or xe2x80x9cmismatch probexe2x80x9d, in expression monitoring arrays, refers to probes whose sequence is deliberately selected not to be perfectly complementary to a particular target sequence. For each mismatch (MM) control in a high-density array there preferably exists a corresponding perfect match (PM) probe that is perfectly complementary to the same particular target sequence. In xe2x80x9cgenericxe2x80x9d (e.g., random, arbitrary, haphazard, etc.) arrays, since the target nucleic acid(s) are unknown perfect match and mismatch probes cannot be a priori determined, designed, or selected. In this instance, the probes are preferably provided as pairs where each pair of probes differ in one or more preselected nucleotides. Thus, while it is not known a priori which of the probes in the pair is the perfect match, it is known that when one probe specifically hybridizes to a particular target sequence, the other probe of the pair will act as a mismatch control for that target sequence. It will be appreciated that the perfect match and mismatch probes need not be provided as pairs, but may be provided as larger collections (e.g., 3, 4, 5, or more) of probes that differ from each other in particular preselected nucleotides. While the mismatch(s) may be located anywhere in the mismatch probe, terminal mismatches are less desirable as a terminal mismatch is less likely to prevent hybridization of the target sequence. In a particularly preferred embodiment, the mismatch is located at or near the center of the probe such that the mismatch is most likely to destabilize the duplex with the target sequence under the test hybridization conditions. In a particularly preferred embodiment, perfect matches differ from mismatch controls in a single centrally-located nucleotide.
The terms xe2x80x9cbackgroundxe2x80x9d or xe2x80x9cbackground signal intensityxe2x80x9d refer to hybridization signals resulting from non-specific binding, or other interactions, between the labeled target nucleic acids and components of the oligonucleotide array (e.g., the oligonucleotide probes, control probes, the array substrate, etc.). Background signals may also be produced by intrinsic fluorescence of the array components themselves. A single background signal can be calculated for the entire array, or a different background signal may be calculated for each region of the array. In a preferred embodiment, background is calculated as the average hybridization signal intensity for the lowest 1% to 10% of the probes in the array, or region of the array. In expression monitoring arrays (i.e., where probes are preselected to hybridize to specific nucleic acids (genes)), a different background signal may be calculated for each target nucleic acid. Where a different background signal is calculated for each target gene, the background signal is calculated for the lowest 1% to 10% of the probes for each gene. Of course, one of skill in the art will appreciate that where the probes to a particular gene hybridize well and thus appear to be specifically binding to a target sequence, they should not be used in a background signal calculation. Alternatively, background may be calculated as the average hybridization signal intensity produced by hybridization to probes that are not complementary to any sequence found in the sample (e.g. probes directed to nucleic acids of the opposite sense or to genes not found in the sample such as bacterial genes where the sample is of mammalian origin). Background can also be calculated as the average signal intensity produced by regions of the array that lack any probes at all.
The term xe2x80x9cquantifyingxe2x80x9d when used in the context of quantifying nucleic acid abundances or concentrations (e.g., transcription levels of a gene) can refer to absolute or to relative quantification. Absolute quantification may be accomplished by inclusion of known concentration(s) of one or more target nucleic acids (e.g. control nucleic acids such as BioB or with known amounts the target nucleic acids themselves) and referencing the hybridization intensity of unknowns with the known target nucleic acids (e.g. through generation of a standard curve). Alternatively, relative quantification can be accomplished by comparison of hybridization signals between two or more genes or between two or more treatments to quantify the changes in hybridization intensity and, by implication, transcription level.
The xe2x80x9cpercentage of sequence identityxe2x80x9d or xe2x80x9csequence identityxe2x80x9d is determined by comparing two optimally aligned sequences or subsequences over a comparison window or span, wherein the portion of the polynucleotide sequence in the comparison window may optionally comprise additions or deletions (i.e., gaps) as compared to the reference sequence (which does not comprise additions or deletions) for optimal alignment of the two sequences. The percentage is calculated by determining the number of positions at which the identical subunit (e.g. nucleic acid base or amino acid residue) occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the window of comparison and multiplying the result by 100 to yield the percentage of sequence identity. Percentage sequence identity when calculated using the programs GAP or BESTFIT (see below) is calculated using default gap weights.
Methods of alignment of sequences for comparison are well known in the art. Optimal alignment of sequences for comparison may be conducted by the local homology algorithm of Smith and Waterman, Adv. Appl. Math. 2: 482 (1981), by the homology alignment algorithm of Needleman and Wunsch J. Mol. Biol. 48: 443 (1970), by the search for similarity method of Pearson and Lipman, Proc. Natl. Acad. Sci. USA 85: 2444 (1988), by computerized implementations of these algorithms (including, but not limited to CLUSTAL in the PC/Gene program by Intelligenetics, Moutain View, Calif., GAP, BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics Software Package, Genetics Computer Group (GCG), 575 Science Dr., Madison, Wis., USA), or by inspection. In particular, methods for aligning sequences using the CLUSTAL program are well described by Higgins and Sharp in Gene, 73: 237-244 (1988) and in CABIOS 5: 151-153 (1989)).