The field of the invention relates to methods and devices for qualitatively and quantitatively observing nucleic acids in a sample of nucleic acids, and more particularly to methods and devices that recognize the presence of a set of subsequences in each nucleic acid in the sample and identify the nucleic acid from a set of subsequences by reference to a database of sequences likely to be present in the sample.
Modern biology teaches the importance of genes and gene expression to processes of health and disease. New individual genes causing or predisposing to conditions or diseases are now reported almost daily. Additionally, it is commonly understood that observing and measuring the spatial and temporal patterns of gene expression in health and disease will contribute immensely to further understanding of these states. Therefore, any observational method that can rapidly, accurately, and economically observe and measure the presence or expression of selected individual genes or of whole genomes will be of great value. Of even more value will be methods that can directly and quantitatively be applied to the complex mixtures of genomic DNA (xe2x80x9cgDNAxe2x80x9d) samples or expressed DNA (xe2x80x9ccDNAxe2x80x9d) samples (synthesized from selected RNA pools) that are typically derived directly from biological samples.
Current observation and measurement methods suffer from one or more disadvantages that render them unnecessarily inaccurate, time consuming, labor intensive, or expensive. Such disadvantages flow from requirements for, e.g., prior knowledge of gene sequences, cloning of complex mixtures of sequences into many individual samples each of a single sequence, repetitive sequencing of sample nucleic acids, electrophoretic separations of nucleic acid fragments, and so forth.
For example, observation techniques for individual mRNA or cDNA molecules, such as Northern blot analysis, RNase protection, or selective hybridization to arrayed cDNA libraries (see Sambrook et al., Molecular Cloningxe2x80x94A Laboratory Manual, Cold Spring Harbor Press, New York (1989)) depend on specific hybridization of a single oligonucleotide probe complementary to the known sequence of an individual molecule. Since a single human cell is estimated to express 10,000-30,000 genes (Liang et al., Science, 257:967-971 (1992)), most of which remain unknown, single probe methods to identify all sequences in a complex sample are prohibitively cumbersome and time consuming.
Similarly, traditional nucleic acid sequencing (Sanger et al., Proc. Natl. Acad. Sci. USA, 74:5463-5467 (1977)), sequencing by hybridization (xe2x80x9cSBHxe2x80x9d) using combinatorial probe libraries (Drmanac et al., Science 260:1649-1652 (1993); U.S. Pat. No. 5,202,231, Apr. 13, 1993 to Drmanac et al.), or classification by oligomer sequence signatures (Lennon et al., Trends Genetics 7:314-317 (1991)), and positional SBH (Broude et al., Proc. Natl. Acad. Sci. USA 91:3072-3076 (1994)) also require that samples be arrayed into purified clones, making the methods inappropriate for complex mixtures.
Several approaches have been described that attempt to characterize complex mixtures of nucleic acids without cloning, all of which at least require electrophoretic separation and/or traditional sequencing. A basic approach is that of differential display (Liang et al., Science 257:967-71 (1992); Liang et al., Curr. Op. Immunol. 7:274-280 (1995)), which uses the polymerase chain reaction (xe2x80x9cPCRxe2x80x9d) with an oligo (dT) primer and a degenerate primer designed to hybridize within a few hundred bases of the cDNA 3xe2x80x2-end. The resulting DNA subsequences of varying length are electrophoretically separated to yield a pattern of, preferably, 100-250 bands. This approach, at best providing only qualitative xe2x80x9cfingerprintsxe2x80x9d of gene expression, suffers from well-known problems, including a high false positive rate, migration of multiple nucleic acid species within a single observed band, and non-quantitative results. Further, putative gene identification depends on purification and traditional sequencing of the components in electrophoretic bands.
Additionally, approaches have been described which attempt to improve differential display, but without obviating the need for traditional sequencing and/or electrophoretic separation. For example, a method described in European Patent Application 0 534 858 A1 (published Mar. 31, 1993), is directed to applying differential display to gDNA samples by using restriction endonuclease (xe2x80x9cRExe2x80x9d) digestion together with PCR employing phasing primers in order to reduce the complexity of such samples to levels electrophoretically observable. The multiple phasing primers divide the gDNA samples in multiple pools of lower complexity, which are electrophoretically separated to yield qualitative xe2x80x9cfingerprints.xe2x80x9d
Other methods improving on differential display include the following, all of which are similarly limited to generating electrophoretic xe2x80x9cfingerprints.xe2x80x9d One such improvement is described in U.S. Pat. No. 5,459,937 (Oct. 17, 1995). This method generates multiple pools of lower complexity by using sequential rounds of PCR applied to 3xe2x80x2-end fragments of cDNAs. The 3xe2x80x2-end fragments lie between a recognition site for a frequently-cutting RE and the poly(A) tail of the cDNA. Fragments in the multiple pools are finally putatively identified by electrophoretic separation and individual sequencing. Another example of such an improvement is described by Prashar et al., Proc. Natl. Acad. Sci. USA 93:659-663 (1996). Primarily, this reference describes an alternative method for generating similar 3xe2x80x2-end fragments, which lie between a recognition site for a frequently-cutting RE and the poly(A) tail of the cDNA.
Differing from differential display is another class of methods, which observe gene expression by sampling, that is, these methods repetitively sequence nucleic acids in a sample and count the sequence occurrences in order to statistically observe gene expression. Such methods require sequencing and are statistically limited in their ability to discover rare transcripts. An early example of such a method determined and counted expressed sequence tags (xe2x80x9cESTsxe2x80x9d), and is described in Adams et al., Science, 252:1651-1656 (1991). Another example is named xe2x80x9cserial analysis of gene expressionxe2x80x9d (Velculescu et al., Science, 270:484-487 (1995)). According to this method, cDNA molecules are converted into representative xe2x80x9ctags,xe2x80x9d which are short oligonucleotides generated from Type IIS RE single-stranded overhangs located at determined distances from the 3xe2x80x2-end of source cDNA. (Type IIS REs cleave a defined distance (up to 20 bp) away from their asymmetric recognition sites (Szybalski, Gene 40:169 (1985)). Approximate, putative identification of the source of a tag requires sequencing the tag and using the sequence and location information to look up possible source sequences in a nucleic acid sequence database.
Other methods for gene and gene-expression measurement, although unrelated to differential display, still have certain disadvantages, such as, e.g., requiring electrophoretic separation. Such a method is described in PCT Publication WO 97/15690, which is herein incorporated by reference in its entirety. According to this method xe2x80x9csignalsxe2x80x9d are generated that represent the length of a nucleotide sequence between defined subsequences in a target nucleic acid. The defined subsequences are preferably restriction endonuclease sites or oligomer binding sites. These signals can then be compared to results of computer simulated signal generation experiments using computer databases of nucleic acid sequences. By this comparison, particular DNA sequences in the database can be identified as present in sample, since they are predicted to generate signals which are also observed. The length information of the signals of this method is, disadvantageously, observed electrophoretically.
All methods previously described for the analysis of complex mixtures of nucleic acids require electrophoretic separation, possibly together with nucleic acid cloning and sequencing. These procedures can be unnecessarily labor intensive, slow, and uneconomical. Recently new approaches have been reported that can, in some implementations, obviate the need for, at least, electrophoretic separation. Such methods depend on hybridization of probe oligonucleotides to recognize short subsequences of from 4 to 20 base pairs on target nucleic acids. The oligonucleotides can be either present in solution or arrayed on a planar surface, such as a glass chip (xe2x80x9cchipxe2x80x9d).
Subsequence recognition by hybridization performed in solution, however, often requires electrophoretic separation. Methods reported in Smith, PCR Methods and Applications 2:21-27(1992) and in Unrau et al., Gene, 145:163-169(1994) use type IIS or interrupted palindromic (xe2x80x9cIPxe2x80x9d) REs to create single-stranded overhangs of unknown sequence from a sample of cDNAs. These overhangs are recognized by hybridization to a plurality of degenerate adapters (called xe2x80x9cindexersxe2x80x9d in the latter reference), each possible overhang being recognizable by the one adapter having a complementary single-stranded terminus. The adapters also include primer sequences, and successful hybridization of an adapter is detected by electrophoretic separation of PCR amplification products.
Hybridization specificity can be improved by using a ligase, which requires exact base-pairing for ligation. See, e.g., Landegren et al., Science 241, 1077-1080 (1988), in which hybridization is only recognized if two probe oligonucleotides hybridize to adjacent position on a target DNA sequence and are ligated by T4 DNA ligase. xe2x80x9cStrand-invasionxe2x80x9d (Guilfoyle et al., Nucleic Acids Res. 25:1854-1858 (1997)) is an extension of the indexing approach designed to further improve hybridization specificity. In strand-invasion, the duplex adapter, or indexer, has a longer single-stranded terminus which has a few determined nucleotides terminal to the previously described degenerate subsequence. The extra, determined nucleotides xe2x80x9cinvadexe2x80x9d into and base pair with a known terminal subsequence of the RE recognition sites, which is adjacent to the unknown single-stranded overhangs that the indexers recognize, thereby improving hybridization stringency. Again, successfully hybridized adapters are recognized by PCR amplification and electrophoretic separation.
Electrophoretic separations can be obviated by arraying the probe oligonucleotides on a chip. Such chips can be prepared by depositing already synthesized oligonucleotide on a derivatized glass surface, or by synthesizing the oligonucleotide directly on the glass surface using a combination of photolithography and oligonucleotide chemistry (McCall et al., Proc. Nat. Acad. Sci. USA 93:13555-13560 (1996)). These probe oligonucleotides are typically designed to hybridize to 10, 15, or 20 bases of a target DNA. Chips capable of recognizing in principle up to 6500 genes have been prepared. The chips are hybridized to samples of fluorescently tagged target DNAs, and are then imaged to determine to which oligonucleotides hybridization has occurred. Although some success has been reported with such chips, well-known problems remain, including those of obtaining unambiguous and reliable hybridization signals. Current methods for solving such signal to noise problems call for the use of significantly redundant sets of probe oligonucleotides. For example, to observe one subsequence of one gene currently requires the synthesis of multiple (greater than 20 per gene) overlapping and mismatched oligonucleotide probes in order to obtain statistically significant data, including data necessary to apply corrections for mismatched hybridizations or non-specific binding. The need for such redundancy of immobilized probes poses serious throughput and cost limitations, especially in view of the 130,000 or so genes possibly expressed in human tissues.
Various hybridization alternatives include the use of arrays of peptide nucleic acids (xe2x80x9cPNAxe2x80x9d) (Weiler et al., Nucleic Acids Res. 25, 2792-2799 (1997)). PNAs, having the bases linked via N-(2-aminomethyl)-glycine moiety, obey Watson-Crick base-pairing rules with DNA but with greater stability than corresponding DNA:DNA hybrids. In another alternative, PCR amplified target DNAs, prepared perhaps from ESTs or cDNA libraries, are physically tethered onto planar surface instead of arrays of probe oligonucleotide.
Therefore, these described observational methods for gene-expression are not capable of rapidly, accurately, and economically observing and measuring the presence or expression of selected individual genes or of whole genomes. These methods typically require, for example, prior knowledge of gene sequences, or cloning of complex mixtures of sequences into many individual samples of a single sequence, or repetitive sequencing of sample components, or electrophoretic separations, and so forth. Importantly, they have not been able to accurately and economically utilize the potential of arrayed oligonucleotides.
Accordingly, an observational method that overcomes these disadvantages will be of great value.
Citation of a reference herein shall not be construed as an admission that such is prior art to the present invention.
It is a broad object of this invention to provide methods and devices for observing and measuring the presence and expression of individual genes or entire genomes that overcome the previously described problems. In particular, the methods and devices of the instant invention make accurate and efficient use of arrayed oligonucleotides (called herein a universal detection array or xe2x80x9cUDAxe2x80x9d) to avoid any requirements for cloning of complex mixtures of sequences into many individual samples of a single sequence, repetitive sequencing of sample components, electrophoretic separations, and so forth.
The methods of this invention identify and quantify nucleic acids in a sample by observing the presence of sequence sets in nucleic acids in a sample. A sequence set, in a preferred but non-limiting embodiment, includes three subsequences, a first and second primary subsequence and an additional subsequence. The subsequences have certain preferred positional constants, including (i) that the additional subsequence is spaced apart from the first primary subsequence by a fixed number of nucleotides in all sequence sets, and (ii) that the second primary subsequence is spaced apart from both the first primary and the additional subsequence by a variable number of nucleotides. Where the first primary and the additional subsequence are positioned adjacently, the sequence set can be considered to include only two independent subsequences.
Once having observed sequence sets from a nucleic acid sample, they can be interpreted with reference to a database of nucleic acid sequences. A sequence set defines a search query which can be used to scan a database of nucleic acid sequences for those sequences having the particular sequence set. Any sequences found are sequences of nucleic acids likely to be present in the original sample of nucleic acids. If no such sequences are found, then a novel nucleic acid, which generates such sequence sets, likely exists in the sample. Preferably, the database includes sequences of nucleic acids likely to be present in the sample, perhaps produced by a pre-selection step from a more general nucleic acid sequence database.
In a preferred embodiment, the search query defined by the sequence set is represented as a regular expression, which is used by regular expression search tools to search nucleic acid sequences represented as symbol strings. In an alternate embodiment, an index of subsequences present in the database of nucleotide sequences is first constructed. Second, using this index, sequences are searched for the regular expression representing a sequence set. This alternative embodiment is preferred in the case of repetitive searches of the same sequence database because it increases search efficiency.
The lengths of the subsequences in a sequence set are chosen in order to obtain adequate resolution and separation of the gene-calling methods. Resolution, defining how precisely a sequence set identifies a nucleic acid, is therefore related to how many sequences in the sequence database have a particular sequence set. Separation defines how accurately and uniquely the observation methods observe a sequence set. In the preferred embodiment, where a UDA of this invention observes the additional sequences in a subsample in parallel, separation improves with decreasing complexity of subsamples. Both these measures are improved by longer subsequences. However, longer subsequences result in increased numbers of subsamples (see below) necessary for adequate coverage. Generally, for nucleic acids derived from expressed human genes, preferred lengths for the subsequences are between 4 and 8.
Preferred methods for observing sequence sets in a sample of nucleic acids generally proceed in two steps. In a first step, a subsample of nucleic acid fragments is formed having those nucleic acids that have selected sequences for the first and second primary subsequences. In a second step, the sequence of the additional subsequence in nucleic acid fragments of the subsample is determined. By forming a plurality of subsamples, each subsample having different selected primary subsequences, there can be a high likelihood that each nucleic acid in the sample will be represented in at least one subsample (that is, the xe2x80x9ccoveragexe2x80x9d of the original sample is adequate). Preferably, the length and sequence of the primary subsequences are chosen to minimize the number of subsamples for adequate coverage in view of the previously described considerations of resolution and separation.
In more detail, preferred methods for the first step produce a subsample by digesting the original sample with restriction endonuclease (xe2x80x9cRExe2x80x9d) enzymes that digest nucleic acids within their recognition sequence and produce single-stranded terminal overhangs. The primary subsequences recognized are therefore the recognition sequences of such REs. Complementary adapters are ligated to these terminal overhangs, either simultaneously with or sequentially to RE digestion. One such adapter preferably has a conjugated biotin (or other capture moiety) to aid in removing improperly digested or undigested fragments from the reaction products. The other adapter preferably has a subsequence which is the recognition site of a restriction endonuclease that digests nucleic acids outside of its recognition site (a Type IIS RE).
Preferred methods for the second step determine the additional subsequences of all the nucleic acids in a subsample simultaneously and in parallel by hybridization of the additional subsequences to an array of probes. To facilitate such a hybridization, a further digestion of the nucleic acid fragments leaves remaining fragments having the additional subsequences as partially single-stranded terminal subsequences. In one embodiment, the additional subsequences are the single-stranded terminal subsequence; and in an alternative embodiment, the additional subsequences include both the single-stranded terminal subsequence and adjacent double-stranded portions of the remaining fragments. This second digestion is preferably with a Type IIS RE, whose recognition site is positioned on one of the previous complementary adapters in view of the length and placement of the additional sequence.
The probes of the probe array have terminal subsequences for hybridizing with and recognizing the terminal additional subsequences. Where all nucleic acids in a sample are to be identified, the probe array includes probes with all possible terminal subsequences for recognizing all possible additional subsequences. In this case, for improved separation, the number of fragments in the subsample of fragments is advantageously less than the number of probes in the array, and the length of the additional subsequence can be chosen accordingly.
In preferred embodiments, techniques are employed to improve the specificity and strength of probe and fragment hybridization, especially in view of the length of the additional subsequences, which can be as short as 4 nucleotides. One technique employs stacking oligomers that hybridize to the probe adjacent to the hybridized fragments. Energetic base stacking interactions between the hybridized stacking oligomer and fragment improve overall hybridized duplex stability. Another technique employs a ligase enzyme to ligate nicks only in those hybridization structures that are fully and correctly hybridized, followed by a wash step to remove mis-hybridized, and, therefore, un-ligated fragments and stacking oligomers.
Where the additional subsequence is only single-stranded, a correctly hybridized structure of the fragment, the probe, and the stacking oligomer is a duplex with no nucleotide gaps. Where the additional subsequence includes also an adjacent double-stranded subsequence which hybridizes with the probe, the hybridized structure has one strand of the fragment partially xe2x80x9cdisplacedxe2x80x9d by the xe2x80x9cinvadingxe2x80x9d strand of the probe, forming what is called a xe2x80x9cdisplacement structure.xe2x80x9d
The nucleotide sequence of the additional sequence is determined by detecting to which probes fragments have hybridized. In various embodiments, either the fragment, the stacking oligomer, or both can be labeled, for example by fluorescent dyes, and the hybridization can be detected by optical or laser stimulation of the dyes.
Advantageously, hybridization and ligation conditions are selected so that the amount of hybridized fragment reflects the concentration of the original fragment in the subsample, and thus that of the original nucleic acid in the sample. To achieve such responsiveness, the concentration of the fragment is made less that the concentration of the probe in order to avoid probe saturation, and the time of hybridization is made less than the time for complete hybridization in order to avoid fragment depletion. Fragments are taken to be not depleted when, preferably, more than xc2xc of their initial concentration remains, and more preferably, when more than xc2xd remains.
In preferred embodiments of the UDA, the probes are attached to solid supports, which are preferably planar glass surfaces or glass beads. Therefore, probes have a linker region of sufficient length in order to reduce stearic hindrance to hybridization due to the surface attachment, and a functional group in order to bind to corresponding groups on the solid supports. Preferably, an amino functional group binds to isothiocyanate groups on derivitized glass surfaces.
This invention is also directed to observing specific and known nucleic acids in the original sample. In this case, the sequence sets to be observed are chosen to be those present in the specific nucleic acids. Subsamples are generated only for the primary subsequences present in the chosen sequence sets, and probe arrays need only include probes for the additional subsequences present in the chosen sequence sets.
Applications of the general gene-calling methods of the invention include observing differential gene expression between pairs of tissues in defined biological states. In this case, the original sample of nucleic acids can be cDNA synthesized from mRNA harvested from the tissues according to methods known in the art. Such differential expression information has many known and developing uses. Applications of these methods directed only to specific genes include, for example, diagnostic or therapeutic tests of the presence and expression of disease-related genes.
This Summary is not limiting. Other embodiments and applications will be apparent to one of average skill in view of the following figures and description.
According to a first embodiment, the instant invention includes a method for identifying and quantifying nucleic acids in a sample of nucleic acids comprising observing subsequence sets present in said sample of nucleic acids, wherein a subsequence set comprises at least two nucleotide subsequences in a non-adjacent arrangement and said subsequence set is observed in said sample if a nucleic acid in said sample includes said two nucleotide subsequences in a non-adjacent arrangement; and searching a database of nucleic acid sequences in order to locate database sequences having said observed subsequence sets or to determine that no such database sequences exist, said database of nucleic acid sequences comprising nucleic acid sequences that might be present in said sample; thereby identifying said located database sequences as sequences of nucleic acids present in said sample.
In an aspect of the first embodiment, the step of observing includes the steps of: providing at least one subsample of first nucleic acid fragments, said first nucleic acid fragments in said subsample being derived from those nucleic acids in said sample in which said first and said second primary nucleotide subsequences have selected sequences; and determining the sequence of said additional nucleotide subsequence in each said first nucleic acid fragment of said subsample.
In another aspect of the first embodiment, the determining step includes: producing second nucleic acid fragments from said first nucleic acid fragments of said subsample, wherein said second nucleic acid fragments have a single-stranded terminal nucleotide subsequence, and wherein said additional nucleotide subsequence comprises said single-stranded terminal nucleotide subsequence; hybridizing a plurality of species of probe molecules with said second nucleic acid fragments, probe molecules of each of said species of probe molecules capable of hybridizing with said second nucleic acid fragments having a particular sequence for said additional nucleotide subsequence; and detecting which of said species of probe molecules has hybridized with said second nucleic acid fragments; whereby the sequences of said additional nucleotide sequences are determined.
In another aspect of the first embodiment, the searching step further includes examining individually and sequentially each sequence in the sequence database for the presence of a sequence set; or representing a sequence set as a regular expression in order to search sequences in the sequence database. In a further aspect, the first embodiment includes, prior to said searching step, a step of constructing an index of subsequences present in the sequences of said sequence database, and wherein said searching step consults said index of subsequences; or after said searching step, a step of storing said located sequences in a permanent computer-readable storage. In further aspects, the step of storing stores along with said located sequences additional information describing said sample of nucleic acids; or the step of observing further observes the amount of nucleic acids in said sample having said observed subsequence sets, and wherein said step of storing stores along with said located sequences said observed amount.
According to a second embodiment, the instant invention includes a computer readable storage medium produced according to the previous methods.
According to a third embodiment, the instant invention includes a method for identifying and quantifying nucleic acids in a sample of nucleic acids comprising: providing at least one subsample of first nucleic acid fragments, said first nucleic acid fragments in said subsample being derived from those nucleic acids in said sample in which a first primary nucleotide subsequence and a second primary nucleotide subsequence have selected sequences, wherein said first and said second primary nucleotide subsequences are not contiguous in said nucleic acids; producing second nucleic acid fragments having a single-stranded terminal nucleotide subsequence from said subsample of first nucleic acid fragments; determining a sequence for an additional nucleotide subsequence of said second nucleic acid fragments, said additional nucleotide subsequence comprising said single-stranded terminal nucleotide subsequence, and wherein said single-stranded nucleotide subsequence is spaced apart from said first primary nucleotide subsequence by a distance of zero or more nucleotides which is the same in all second nucleic acid fragments, said determining by: hybridizing a plurality of species of probe molecules with said second nucleic acid fragments, each of said species of probe molecules capable of hybridizing with said second nucleic acid fragments having a particular sequence for said additional nucleotide subsequence, and detecting which of said species of probe molecules has hybridized with said second nucleic acid fragments, and the amount of said second nucleic acid fragments hybridized with said species of probe molecule; searching a database of nucleic acid sequences in order to locate database sequences having said selected first primary subsequence, said selected second primary subsequence, and said determined additional subsequence or to determine that no such database sequences exist, said database of nucleic acid sequences comprising nucleic acid sequences that might be present in said sample; thereby identifying said located database sequences as sequences of nucleic acids present in said sample.
In an aspect of the third embodiment, the probe molecules comprise a nucleotide sequence, which in turn comprises a hybridization region nucleotide subsequence and a core nucleotide subsequence, the sequence of said hybridization region nucleotide subsequence being complementary to the sequence of said additional subsequence hybridizable to said species of probe molecules, said core nucleotide subsequence being adjacent to said hybridization region nucleotide subsequence, and wherein said step of hybridizing comprises: hybridizing a plurality of species of probe molecules with said second nucleic acid fragments and with stacking oligomers to form a hybridization structure, the sequence of said stacking oligomers being complementary to a hybridizable portion of the sequence of said core nucleotide subsequence of said probe molecules, said hybridizable portion being adjacent to said hybridization region nucleotide subsequence; and ligating nicks in said hybridization structure.
According to a fourth embodiment, the instant invention includes a detection array for recognizing terminal subsequences of target nucleic acids, said array comprising: one or more surfaces; a plurality of discrete observational cells arranged on said surfaces in which are bound probe molecules, each probe molecule being a member of one of a plurality of species of probe molecules, wherein each discrete observational cell has bound probe molecules of at most one species, and wherein said probe molecules comprise: a hybridization region, wherein said hybridization region of said probe molecules of one species of probe molecule are capable of hybridizing with said terminal subsequences of said target nucleic acids having a single nucleotide sequence, a core region adjacent to and conjugated with said hybridization region, and an attachment means for binding said hybridization region and said core region to said surfaces; and a plurality of discrete error-checking cells to which are bound probe molecules, wherein to each discrete error-checking cell are bound probe molecules of a plurality of species, such that each species of probe molecule is bound to one discrete observational cell and to at least one discrete error-checking cell.
According to a fifth embodiment, the instant invention includes a method for detecting a terminal subsequence in a target nucleic acid, comprising: hybridizing said target nucleic acid and a stacking oligonucleotide to probe molecules of a universal array of the fourth embodiment, wherein said target nucleic acid hybridizes to a hybridization region of said probe molecules, wherein said stacking oligonucleotide hybridizes to at least a portion of a core region of said probes, said portion being adjacent to said hybridization region of said probe molecules, and wherein said hybridizing occurs in the presence of a nucleic acid ligase under ligating conditions; washing the hybridized detection array in denaturing conditions; and detecting which probe molecules have hybridized with said target nucleic acid.
In an aspect, in the fifth embodiment the terminal subsequence of said target nucleic acid is single-stranded, and wherein said hybridization region of said probe molecules hybridizes to said single-stranded end subsequence. In an aspect, in the fifth embodiment the terminal subsequence of said target nucleic acid comprises a single-stranded end subsequence and an adjacent double-stranded subsequence, and wherein said hybridization region of said probe molecules hybridizes to said single-stranded end subsequence and to a strand of said adjacent double-stranded subsequence, whereby a stand of said target nucleic acid is displaced from said double-stranded region.
According to a sixth embodiment, the instant invention includes a method for binding probe molecules on a glass surface comprising: preparation of said glass surface comprising washing with an acid of a pH of no more than 1; amino-reactive-derivitizing said prepared surface with amino-reactive groups; contacting said derivitized surface with a solution of probe molecules in order to deposit said probe molecules, wherein said solution has a concentration of probe molecules of less than 200 micro-moles per liter, and wherein said probe molecules comprise an amino functional group and a subsequence of at least 16 oligonucleotides; and passivating amino-reactive groups on said contacted surface.
In an aspect, in the sixth embodiment the acid comprises nitric acid of a concentration of at least 65%. In another aspect, the step of amino-reactive-derivitization comprises: amino-derivitizing said prepared surface with amino groups by immersion in an amino containing silane; and conjugating amino-reactive groups to said amino groups on said surface by immersion in phenylene diisothiocyanate.
According to a seventh embodiment, the instant invention includes a method for differential gene expression analysis comprising: applying the method of first embodiment to a nucleic acid sample derived from a first tissue; applying the method of the first embodiment to a nucleic acid sample derived from a second tissue; and comparing the nucleic acids identified in these two steps. In an aspect, in the seventh embodiment, the first tissue comprises a particular tissue in a first state, and wherein said second tissue comprises said particular tissue in a second state.
According to a eighth embodiment, the instant invention includes a detection array according to the fourth embodiment wherein probe molecules are bound to a surface according to the method of the sixth embodiment.
According to a ninth embodiment, the instant invention includes a kit comprising in separate containers: first reagents for providing a subsample of first nucleic acid fragments from an original sample of nucleic acids, said first nucleic acid fragments in said subsample being derived from those nucleic acids in said original sample having selected sequences for a first and a second primary nucleotide subsequence; second reagents for providing second nucleic acid fragments from said subsample of first nucleic acid fragments, wherein said second nucleic acid fragments have an additional subsequence comprising a terminal single-stranded subsequence of said second nucleic acid fragments, and wherein said additional subsequence is at a fixed distance from said first primary subsequence; and a detection array according to the fourth embodiment for recognizing said additional subsequences of said second nucleic acid fragments.
In an aspect, the ninth embodiment includes a computer readable medium containing instructions for causing a computer to search a database of nucleic acid sequences for those sequences having said first primary nucleotide subsequence, second primary nucleotide subsequence, and said additional nucleotide subsequence.
According to a tenth embodiment, the instant invention includes a computer-based system for processing gene-expression information comprising: input/output means for input of user requests and output of processing responses; storage means for storing nucleic acid sequences identified in samples of nucleic acids according to the method of first embodiment; and processing means for, according to said user requests, either searching a database of nucleic acid sequences in order to locate database sequences having said observed subsequence sets or to determine that no such database sequences exist, said database of nucleic acid sequences comprising nucleic acid sequences that might be present in said sample, and storing said located database sequences in said storage means, or for comparing two or more sequences retrieved from said storage means, said sequences having been identified in two or more samples of nucleic acids, in order to determine differential presence of said identified database sequences in said samples, and generating processing responses of said searching or of said comparing.
The practice of the present invention employs, unless otherwise indicated, conventional techniques of cell biology, cell culture, molecular biology, microbiology, recombinant DNA, immunology, transgenic animal technology, and pharmacology. See, e.g., Sambrook et al., Molecular Cloning A Laboratory Manual, Cold Spring Harbor Press, (2nd. ed., 1989); Glover ed., DNA Cloning, Vol 1 and 2 (1985); Gait ed., Oligonucleotide Synthesis (1984); Hames et al. eds., Transcription and Translation (1984); Freshney, Culture of Animal Cells, Alan N. Liss, Inc. (1997); Immobilized Cells and Enzymes, IRL Press (1986); Perbal, A Practical Guide to Molecular Cloning Methods in Enzymology, Academic Press (1984); Miller et al. eds., Gene Transfer Vectors for Mammalian Cells, Cold Spring Harbor Laboratory (1987); Wu et al. eds., Methods in Enzymology, Vols 154 and 155; Mayer et al. eds., Immunochemical Methods in Cell and Molecular Biology, Academic Press (1987); Weir et al. eds., Handbook of Experimental Immunology, Vols 1-4 (1986). All of these references are incorporated herein by reference in their entirety.