For purposes of organization, this background has been divided into seven parts as follows:
(1) Reactive Groups of Labeling Reagents
(2) Linker Arms for Connecting Labels to Targets
(3) Porphyrin Fluorescent Dyes as Labels
(4) Alterations in Fluorescent Properties
(5) Fluorescent Intercalators
(6) Chemiluminescence
(6) Real Time Detection through Fluorescence
(7) Primer Binding Sequences in Analytes
(1) Reactive Groups of Labeling Reagents
The use of non-radioactive labels in biochemistry and molecular biology has grown exponentially in recent years. Among the various compounds used as non-radioactive labels, aromatic dyes that produce fluorescent or luminescent signal are especially useful. Notable examples of such compounds include fluorescein, rhodamine, coumarin and cyanine dyes such as Cy3 and Cy5. Composite dyes have also been synthesized by fusing two different dyes together (Lee et al., (1992) Nucl. Acids Res. 20; 2471-2488; U.S. Pat. Nos. 5,945,526 and 6,008,373).
Non-radioactive labeling methods were initially developed to attach signal-generating groups onto proteins. This was achieved by modifying labels with chemical groups such that they would be capable of reacting with the amine, thiol, and hydroxyl groups that are naturally present on proteins. Examples of reactive groups that were used for this purpose included activated esters such as N-hydroxysuccinimide esters, isothiocyanates and other compounds. Consequently, when it became desirable to label nucleotides and nucleic acids by non-radioactive means, methods were developed to convert nucleotides and polynucleotides into a form that made them functionally similar to proteins. For instance, U.S. Pat. No. 4,711,955 disclosed the addition of amines to the 8-position of a purine, the 5-position of a pyrimidine and the 7-position of a deazapurine. The same methods that could add a label to the amine group of a protein could now be applied towards these modified nucleotides.
Among the compounds used as fluorescent labels, the cyanine-based dyes have become widely used since they have high extinction coefficients and narrow emission bands. Furthermore, modifications can be made in their structure that can alter the particular wavelengths where these compounds will absorb and fluoresce light. The cyanine dyes have the general structure comprising two indolenine based rings connected by a series of conjugated double bonds. The dyes are classified by the number (n) of central double bonds connecting the two ring structures; monocarbocyanine or trimethinecarbocyanine when n=1; dicarbocyanine or pentamethinecarbocyanine when n=2; and tricarbocyanine or heptamethinecarbocyanine when n=3. The spectral characteristics of the cyanine dyes have been observed to follow specific empirical rules. For example, each additional conjugated double bond between the rings will raise the absorption and emission maximum about 100 nm. Thus, when a compound with n=1 has a maximum absorption of approximately 550 nm, equivalent compounds with n=2 and n=3 will have maximum absorptions of 650 nm and 750 nm respectively. Addition of aromatic groups to the sides of the molecules can shift the absorption by 15 nm to a longer wavelength. The groups comprising the indolenine ring can also contribute to the absorption and emission characteristics. Using the values obtained with gem-dimethyl group as a reference point, oxygen substituted in the ring for the gem-dimethyl group decreases the absorption and emission maxima by approximately 50 nm. In contrast, substitution of sulfur increases the absorption and emission maxima by about 25 nm. R groups on the aromatic rings such as alkyl, alkyl-sulfonate and alkyl-carboxylate have little effect on the absorption and emission maxima of the cyanine dyes (U.S. Pat. No. 6,110,630).
Cyanine dyes synthesized with arms containing functional groups have been prepared with iodoacetamide, isothiocyanate and succinimidyl esters that react with sulfhydryl groups on proteins (Ernst, et al., (1989), Cytometry 10, 3-10; Mujumdar, et al., (1989), Cytometry 10, 11-19; Southwick, et al., (1990) Cytometry 11, 4187-430). A new series of modified dyes were prepared which contained a sulfonate group on the phenyl portion of the indolenine ring. (Mujumdar et al., (1993) Bioconjugate Chemistry 4; 105-111) that increased the water solubility of the dyes. These dyes were activated by treatment with disuccinimidyl carbonate to form succinimidyl esters that were then used to label proteins by substitution at the amine groups. Other activating groups have since been placed on the cyanine dyes. In U.S. Pat. No. 5,627,027 and U.S. Pat. No. 5,268,486, cyanine dyes were prepared which comprise isothiocyanate, isocyanate, monochlorotriazine, dichlorotriazine, mono or di-halogen substituted pyridine, mono or di-halogen substituted diazine, aziridine, sulfonyl halide, acid halide, hydroxy-succinimide ester, hydroxy-sulfosuccinimide ester, imido esters, glyoxal groups and aldehydes and other groups, all of which can form a covalent bond with an amine, thiol or hydroxyl group on a target molecule.
In U.S. Pat. No. 6,110,630, cyanine dyes were prepared with a series of reactive groups derived from N-hydroxynaphthalimide. These groups included hydroxysuccinimide, para-nitrophenol, N-hydroxyphtalimide and N-hydroxynaphtalimide all of which can react with nucleotides modified with primary amines. The same chemical reactions that have been described above have also been used in U.S. Pat. No. 6,114,350 but with the constituents reversed. In this disclosure, the cyanine dyes were modified with amine, sulfhydryl or hydroxyl groups and the target molecules were modified to comprise the appropriate reactive groups.
Cyanine dyes containing arms that comprise reactive functional groups have been prepared by the general scheme in which the entire heterocyclic compound comprising the two indolenine structures and the intervening unsaturated chain was synthesized first; the terminal reactive groups or any other functionality necessary to link the dyes to proteins or nucleic acids were then added after the completion of the whole dimeric dye unit.
(2) Linker Arms for Connecting Labels to Targets
Labeled nucleotides have been used for the synthesis of DNA and RNA probes in many enzymatic methods including terminal transferase labeling, nick translation, random priming, reverse transcription, RNA transcription and primer extension. Labeled phosphoramidite versions of these nucleotides have also been used with automated synthesizers to prepare labeled oligonucleotides. The resulting labeled probes are widely used in such standard procedures as northern blotting, Southern blotting, in situ hybridization, RNAse protection assays, DNA sequencing reactions, DNA and RNA microarray analysis and chromosome painting.
There is an extensive literature on chemical modification of nucleic acids by means of which a signal moiety is directly or indirectly attached to a nucleic acid. Primary concerns of this art have been with regard to which site in a nucleic acid is used for attachment i.e. sugar, base or phosphate analogues and whether these sites are disruptive or non-disruptive (see for instance U.S. Pat. Nos. 4,711,955 and 5,241,060), the chemistry at the site of attachment that allows linkage to a reactive group or signaling moiety a spacer group usually consisting of a single aromatic group (U.S. Pat. Nos. 4,952,685 and 5,013,831) or a carbon/carbon aliphatic chain to provide distance between the nucleic acid and a reactive group or signaling moiety and a reactive group at the end of the spacer such as an OH, NH, SH or some other group that can allow coupling to a signaling moiety and the nature of the signaling moiety.
Although the foregoing have all been descriptions of the various aspects that are concerned with the synthesis of modified nucleotides and polynucleotides, they have also been shown to be significant factors with regard to the properties of the resultant nucleotides and polynucleotides. Indeed, there have been numerous demonstrations that the modified nucleotides described in the present art have shortcomings compared to unmodified nucleotides.
For instance, these factors can have major impact on the ability of these modified nucleotides to be incorporated by polymerases. A consequence of this is that when using a modified base as the sole source of that particular nucleotide, there may be a loss in the amount of nucleic acid synthesis compared to a reaction with unmodified nucleotides. As a result of this, modified nucleotides are usually employed as part of a mixture of modified and unmodified versions of a given nucleotide. Although this restores synthesis to levels comparable to reactions without any modified nucleotides, a bias is often seen against the use of the modified version of the nucleotide. As such, the final proportion of modified/unmodified nucleotide may be much lower than the ratio of the reagents. Users then have a choice of either using nucleic acids that are minimally labeled or of decreased yields. When comparable modified nucleotides are used that only comprise a linker arm attached to a base (such as allylamine dUTP) difficulties with incorporation are seldom seen. As such, the foregoing problem is likely to be due to the interactions of the label with either the polymerase or the active site where synthesis is taking place.
Difficulties in the use of polymerases can be bypassed by the use of oligonucleotide synthesizers where an ordered chemical joining of phosphoramidite derivatives of nucleotides can be used to produce labeled nucleic acids of interest. However, the presence of signal agents on modified nucleotides can even be problematic in this system. For instance, a phosphoramidite of a modified nucleotide may display a loss of coupling efficiency as the chain is extended. Although this may be problematic in itself, multiple and especially successive use of modified nucleotides in a sequence for a synthetic oligonucleotide can result in a drastic cumulative loss of product. Additionally, chemical synthesis is in itself not always an appropriate solution. There may be circumstances where labeled nucleic acids need to be of larger lengths than is practical for a synthesizer. Also, an intrinsic part of synthetic approaches is a necessity for a discrete sequence for the nucleic acid. For many purposes, a pool or library of nucleic acids would require an impractically large number of different species for synthetic approaches.
An example of a method to increase the yield of labeled oligonucleotides or polynucleotide is to use a non-interfering group such as an allylamine modified analogue during synthesis by either a polymerase or an oligonucleotide synthesizer. Labeling is then carried out post-synthetically by attachment of the desired group through the chemically reactive allylamine moieties. However, in this case, although incorporation or coupling efficiency may be restored, there may still be problems of the coupling efficiencies of attachment of the desired group to the allylamine. For instance, coupling of labels to allylamine moieties in a nucleic acid is dramatically less efficient for double-stranded DNA compared to single-stranded targets. In addition to potential yield problems, the functionality of the modification may be affected by how it is attached to a base. For instance if a hapten is attached to a base, the nature of the arm separating the hapten from the base may affect its accessibility to a potential binding partner. When a signal generating moiety is attached through a base, the nature of the arm may also affect interactions between the signal generating moiety and the nucleotide and polynucleotide.
Attempts to limit these deleterious interactions have been carried out in several ways. For instance, attachment of the arm to the base has been carried out with either a double bond alkene group (U.S. Pat. No. 4,711,955) or a triple bond alkyne group (U.S. Pat. No. 5,047,519) thereby inducing a directionality of the linker away from the nucleotide or polynucleotide. However, this approach is of limited utility since this rigidity is limited to only the vicinity of the attachment of the linker to the base. In addition, attempts at limiting interactions have been carried out by having the arm displace the active or signal group away from the nucleotide or polynucleotide by lengthening the spacer group. For instance, a commercially available modified nucleotide included a seven carbon aliphatic chain (Cat. No. 42724, ENZO Biochem, Inc. New York, N.Y.) between the base and a biotin moiety used for signal generation. This product was further improved by the substitution of linkers with 11 or even 16 carbon lengths (Cat. Nos. 42722 and 42723, also available from ENZO Biochem, Inc. New York, N.Y.). A comparison was also carried out using different length linker arms and a cyanine dye labeled nucleotide (Zhu et al., 1994 Nucl. Acid Res. 22; 3418-3422). A direct improvement in efficiency was noted as the length was increased from 10 to 17 and from 17 to 24. However, even with the longest linker, it could be seen that there was incomplete compensation for the presence of the fluorescent marker in terms of efficiency. This may be a result of the fact that due to the flexibility of the aliphatic carbon chain used for this spacer segment, the reporter groups will seldom be found in a conformation where they are completely extended away from the nucleotide itself. Thus, although this approach changed the length of the linker, it was not a change in the flexible nature of the spacer.
In an attempt to circumvent this problem, in U.S. Pat. No. 5,948,648, Khan et al. have disclosed the use of multiple alkyne or aromatic groups connecting a marker to a nucleotide. However, this method employs highly non-polar groups in the linker that may induce interaction between the linker and the marker, thereby limiting its effectiveness by decreasing coupling efficiencies or by increasing non-specific binding by labeled compounds that include these groups. In addition, these groups may decrease the water solubility of either the labeled compound or various intermediates used to make the labeled compound.
The continued difficulties in using activated or labeled nucleotides which have incorporated the foregoing features demonstrates that there are still deleterious interactions occurring between the base, oligonucleotide or polynucleotide and the moiety at the end of the arm in methods of the previous art. Although the foregoing has been described with respect to attachment to nucleic acids, these problems are shared with other groups for which it may be useful to attach a marker or label.
(3) Porphyrin Fluorescent Dyes as Labels
Assays that employ fluorescently labeled probes depend upon illumination at one particular wavelength and detection of the emission at another wavelength (the Stokes shift). There exists an extensive literature on the variety of compounds that have various excitation/emission spectral characteristics suitable for such assays. When fluorescent compounds are used for comparative expression analysis, the ability to carry out signal detection simultaneously for each label depends upon how marked is the difference between the labels. Thus, fluorophores such as Cy 3 and Cy 5 are commonly used in expression analysis since they have emission peaks at 570 and 667 respectively. One class of compounds that has not been effectively exploited for this analysis are the porphyrins.
The ability of porphyrins to absorb light energy and efficiently release it has been used in a number of other systems. For example, light induced cleavage of nucleic acids can be carried out by a number of metallo-porphyrins that are either free in solution or attached to a sequence specific oligonucleotide (Doan et al., (1986) Biochemistry 26; 6736-6739). One application of this system has been the targeting and killing of cancer cells through light induced DNA damage after absorption of metallo-porphyrins (Moan et al., (1986) Photochemistry and Photobiology 43; 681-690). Another example of the high energetic ability of metallo-porphyrins can be seen with their use as catalytic agents (Forgione et al., U.S. Pat. No. 4,375,972) for non-enzymatic chemiluminescence. Furthermore, there are cases where porphyrins have been used as labeling reagents, for example U.S. Pat. Nos. 6,001,573 and 5,464,741 where Pd octaethylporphyrins were converted to the isothiocyanate and used as labeling reagents particularly for use in immunoassays. However, in these cases metallic porphyrins were exclusively used.
The drawback of the use of metallo-porphyrins is that the destructive abilities of these compounds are counter-productive when used in array analysis or other assay systems which require the maintenance of the integrity of the nucleic acid strands of analytes or probes. Therefore, it would be highly advantageous to be able to utilize porphyrins for their fluorescent and chemiluminescent properties while eliminating their nucleic acid destructive properties.
(4) Alterations in Fluorescent Properties
In previous art, it has been shown that the addition of phenylacetylene groups to anthracene increases the emission maxima 72 nm. (Maulding and Roberts, 1968 J Org Chem). Furthermore, the Stokes shift, the difference between the absorption and emission maxima, was also increased by the addition of the phenyl acetylene group to the anthracene dye. Specifically the difference of 6 nm was increased to 31 nm following the addition of two phenyl acetylene groups. When the phenyl acetylene group was added to naphthacene the difference between the absorption and emission maxima increased from 7 nm to 32 nm. Furthermore, the quantum yields of anthracene and naphtacene was significantly increased by the addition of the phenyl acetylene groups to them.
The application of this effect was limited to these compounds because the chemistries and reactions used for the addition of these substituents required ketone or aldehyde groups. Also, addition of unsaturated groups to dyes has the undesired effect of potentially decreasing their solubility in aqueous solutions. In addition, the modified anthracene dyes described by Maulding and Roberts lacked any reactive groups that could be used for attachment.
(5) Fluorescent Intercalators
Intercalating dyes have been used for the detection and visualization of DNA in many techniques including the detection of DNA in electrophoresis gels, in situ hybridization, flow cytometry and real time detection of amplification. An intercalating dye with a long history of popular usage is ethidium bromide. Ethidium bromide has the useful properties of high affinity for nucleic acids and an increased fluorescence after binding. This enhancement of fluorescence takes place with both single-stranded and double-stranded nucleic acids with the double-stranded DNA showing a much more marked effect, generally around thirty-fold. Other dyes which exhibit increased fluorescence signal upon binding to nucleic acid have been developed in recent years including such compounds as acridine orange, SYBR Green and Picogreen. There is continually a need, however, for increased signal generation after the binding or intercalation with nucleic acids especially for the use in techniques, such as real time amplification.
(6) Chemiluminescence
The use of chemiluminescent reagents for signal detection has gained wider use in recent years. There are several different classes of compounds that can produce luminescent signals including 1,2-dioxetanes and luminols. 1,2-Dioxetanes are four-membered rings which contain two adjacent oxygens. Some forms of these compounds are very unstable and emit light as they decompose. On the other hand, the presence of an adamantyl group can lead to a highly stable form with a half-life of several years (Wiering a et al. (1972) Tetrahedron Letters 169-172). Use can be made of this property by using a stable form of a 1,2-dioxetane as a substrate in an enzyme linked assay where the presence of the enzyme will transform the substrate into an unstable form thereby using chemiluminescence for signal generation. =Enzymatic induction of a chemiluminescent signal has been described where an adamantyl dioxetane derivative was synthesized with an additional group that was a substrate for enzymatic cleavage (U.S. Pat. No. 5,707,559, Schaap et al. (1987) Tetrahedron Letters, 28; 935-938; Schaap et al. (1987) Tetrahedron Letters, 28; 1159-1163). In the presence of the appropriate enzyme, cleavage would take place and an unstable compound would be formed that emitted light as it decomposes.
A common design of dioxetane derivatives for this method is attachment of an aryl group that has hydroxyl substituents which contain protecting groups. The removal of the protecting group by the appropriate enzyme results in a negatively charged oxygen. This intermediate is unstable and leads to the decomposition of the compound and the emission of light. Various 1,2-dioxetane derivatives have been developed that can be activated by different enzymes depending upon the nature of the protecting group. Enzymes that have been described as potentially useful for this purpose have included alkaline phosphatase, galactosidase, glucosidase, esterase, trypsin, lipase, and phospholipase among others (for instance, see U.S. Pat. No. 4,978,614).
Variations of this basic method have also been disclosed. For example, Urdea (U.S. Pat. No. 5,132,204) has disclosed stable 1,2-dioxetanes derivatives which require the activity of two enzymes in order to produce a signal. Haces has disclosed a method where the decomposition of the 1,2-dioxetane is triggered by an enzymatic or chemical reaction which releases a terminal nucleophile (U.S. Pat. No. 5,248,618). This can now undergo an intramolecular substitution reaction, thereby liberating a phenoxy group which triggers the decomposition of the 1,2-dioxetane. The chain where the intramolecular reaction takes place is made up of single bonds thus allowing complete rotational freedom around all the bonds and relying on a random interaction between the groups participating in the intramolecular reaction.
Despite improvements within the field of chemiluminescent signaling there still exists the need for new substrates and reagents. Many of the substrates that are currently available produce a high level of background due to enzyme independent triggering of the decomposition of the substrate and release of chemiluminescent signal. Therefore, a new type of 1,2-dioxetane which is more stable in the absence of an enzyme would be a desirable reagent.
(7) Real Time Detection through Fluorescence
Amplification of nucleic acids from clinical samples has become a widely used technique. The first methodology for this process, the Polymerase Chain Reaction (PCR), was described by Mullis et al. in U.S. Pat. No. 4,683,202. Since that time, other methodologies such as Ligation Chain Reaction (LCR) (U.S. Pat. No. 5,494,810), GAP-LCR (U.S. Pat. No. 6,004,286), Nucleic Acid Sequence Based Amplification (NASBA) (U.S. Pat. No. 5,130,238), Strand Displacement Amplification (SDA) (U.S. Pat. No. 5,270,184 and U.S. Pat. No. 5,455,166) and Loop Mediated Amplification (U.S. patent application Ser. No. 09/104,067; European Patent Application Publication No. EP 0 971 039 A) have been described. Detection of an amplified product derived from the appropriate target has been carried out in number of ways. In the initial method described by Mullis et al., gel analysis was used to detect the presence of a discrete nucleic acid species. Identification of this species as being indicative of the presence of the intended target was determined by size assessment and the use of negative controls lacking the target sequence. The placement of the primers used for amplification dictated a specific size for the product from appropriate target sequence. Spurious amplification products made from non-target sequences were unlikely to have the same size product as the target derived sequence. Alternatively, more elaborate methods have been used to examine the particular nature of the sequences that are present in the amplification product. For instance, restriction enzyme digestion has been used to determine the presence, absence or spatial location of specific sequences. The presence of the appropriate sequences has also been established by hybridization experiments. In this method, the amplification product can be used as either the target or as a probe.
The foregoing detection methods have historically been used after the amplification reaction was completed. More recently, methods have been described for measuring the extent of synthesis during the course of amplification, i.e. “real-time” detection. For instance, in the simplest system, an intercalating agent is present during the amplification reaction (U.S. Pat. Nos. 5,994,056 and 6,174,670). This method takes advantage of an enhancement of fluorescence exhibited by the binding of an intercalator to double-stranded nucleic acids. Measurement of the amount of fluorescence can take place post-synthetically in a fluorometer after the reaction is over, or real time measurements can be carried out during the course of the reaction by using a special PCR cycler machine that is equipped with a fluorescence detection system and uses capillary tubes for the reactions (U.S. Pat. Nos. 5,455,175 and 6,174,670). As the amount of double-stranded material rises during the course of amplification, the amount of signal also increases. The sensitivity of this system depends upon a sufficient amount of double-stranded nucleic acid being produced that generates a signal that is distinguishable from the fluorescence of a) unbound intercalator and b) intercalator molecules bound to single-stranded primers in the reaction mix. Specificity is derived from the nature of the amplification reaction itself or by looking at a Tm profile of the reaction products. Although the initial work was done with Ethidium Bromide, SYBR Green™ is more commonly used at the present time. A variation of this system has been described in U.S. Pat. No. 6,323,337 B1, where the primers used in PCR reactions were modified with quenchers thereby reducing signal generation of a fluorescent intercalator that was bound to a primer dimer molecule. Signal generation from target derived amplicons could still take place since amplicons derived from target sequences comprised intercalators bound to segments that were sufficiently distant from the quenchers.
Another method of analysis that depends upon incorporation has been described by Nazarenko (U.S. Pat. No. 5,866,336). In this system, signal generation is dependent upon the incorporation of primers into double-stranded amplification products. The primers are designed such that they have extra sequences added onto their 5′ ends. In the absence of amplification, stem-loop structures are formed through intramolecular hybridization that consequently bring a quencher into proximity with an energy donor thereby preventing fluorescence. However, when a primer becomes incorporated into double-stranded amplicons, the quencher and donor become physically separated and the donor is now able to produce a fluorescent signal. The specificity of this system depends upon the specificity of the amplification reaction itself. Since the stem-loop sequences are derived from extra sequences, the Tm profile of signal generation is the same whether the amplicons were derived from the appropriate target molecules or from non-target sequences.
In addition to incorporation based assays, probe based systems have also been used for real-time analysis. For instance, a dual probe system can be used in a homogeneous assay to detect the presence of appropriate target sequences. In this method, one probe comprises an energy donor and the other probe comprises an energy acceptor (European Patent Application Publication No. 0 070 685). Thus, when the target sequence is present, the two probes can bind to adjacent sequences and allow energy transfer to take place. In the absence of target sequences, the probes remain unbound and no energy transfer takes place. Even if by chance, there are non-target sequences in a sample that are sufficiently homologous that binding of one or both probes takes place, no signal is generated since energy transfer would require that both probes bind and that they be in a particular proximity to each other. Advantage of this system has been taken by Wittwer et al., in U.S. Pat. No. 6,174,670 for real time detection of PCR amplification using the capillary tube equipped PCR machine described previously. The primer annealing step during each individual cycle can also allow the simultaneous binding of each probe to target sequences providing an assessment of the presence and amount of the target sequences. In a further refinement of this method, one of the primers comprises an energy transfer element and a single energy transfer probe is used. Labeled probes have also been used in conjunction with fluorescent intercalators to allow the specificity of the probe methodology to be combined with the enhancement of fluorescence derived from binding to nucleic acids. This was first described in U.S. Pat. No. 4,868,103 and later applied to amplification reactions in PCT Int. Appl. WO 99/28500.
Probes have also been used that comprise an energy donor and an energy acceptor in the same nucleic acid. In these assays, the energy acceptor “quenches” fluorescent energy emission in the absence of appropriate complementary targets. In one system described in U.S. Pat. No. 5,118,801, “molecular beacons” are used where the energy donor and the quencher are kept in proximity by secondary structures with internal base pairing. When the target sequences are present, complementary sequences in the Molecular Beacons allow hybridization events that destroy this secondary structure thereby allowing energy emission. In another system that has been termed Taqman, use is made of the double-stranded selectivity of the exonuclease activity of Taq polymerase (U.S. Pat. No. 5,210,015). When target molecules are present, hybridization of the probe to complementary sequences converts the single-stranded probe into a substrate for the exonuclease. Degradation of the probe separates the donor from the quencher thereby releasing light.
(8) Primer Binding Sequences in Analytes
One of the characteristics of eukaryotic mRNA is the presence of poly A tails at their 3′ ends. This particular feature has provided a major advantage in working with mRNA since the poly A segment can be used as a universal primer binding site for synthesis of cDNA copies of any eukaryotic mRNA. However, this has also led to a certain bias in RNA studies, since the 3′ ends of mRNA are easily obtained and thoroughly studied but the 5′ ends lack such consensus sequences. Thus, a large number of systems have been described whose major purpose has been to generate clones that have complete representation of the original 5′ end sequences. This has also been carried over in array analysis for comparative transcription studies. Since substantially all systems used for this purpose are initiated by oligo T priming at the 3′ end of mRNA, sequences downstream are dependent upon the continuation of synthesis away from the 3′ starting point. However, it is well known that there is an attenuation effect of polymerization as polymerases frequently fall off of templates after synthesis of a particular number of bases. Another effect is generated by the presence of RNase H that is a component of most reverse transcriptases. Paused DNA strands may allow digestion of the RNA near the 3′ end of the DNA thereby separating the uncopied portion of the RNA template from the growing DNA strand. This effect may also occur randomly during the course of cDNA synthesis. As such, representation of sequences is inversely proportional to their distance from the 3′ poly A primer site.
Although prior art has capitalized extensively on poly A segments of RNA, it should be recognized that poly A mRNA represents only a portion of nucleic acids in biological systems. Another constraint in prior art is that the use of poly A tails is only available in eucaryotic mRNA. Two areas of especial interest are unable to enjoy this benefit. One area is bacterial mRNA since they intrinsically lack poly A additions. The second are is heterologous RNA in eukaryotic systems. For any particular eukaryotic gene, there is a considerable amount of genetic information that is present in heterologous RNA that is lost by the use of polyadenylated mature forms of transcripts that comprise only exon information.
The lack of primer consensus sequence in these systems has necessitated the use of alternatives to oligo T priming. In prior art, bacterial expression studies have been carried out by random priming with octamers (Sellinger et al., 2000 Nature Biotechnology 18; 1262-1268), a selected set of 37 7-mers and 8-mers (Talaat et al., 2000 Nature Biotechnology 18; 679-682) and a set of 4,290 gene specific primers (Tao et al., 1999 J. Bact. 181; 6425-6490). The use of large sets of primers as represented by random primers and set of gene specific primers requires high amounts of primers to drive the reaction and should exhibit poor kinetics due to the sequence complexity of the primers and targets. I.e. for any given sequence in an analyte, there is only a very minute portion of the primers that are complementary to that sequence. Large sets of random primers also have the capacity to use each other as primers and templates thereby generating nonsense nucleic acids and decreasing the effective amounts of primers available. Attempts to improve the kinetics of priming by increasing the amounts of random oligonucleotides is very limited. First off, there are physical constraints in the amount of oligonucleotides that are soluble in a reaction mixture. Secondly, increases in the amount of primers is self-limiting since increased primer concentrations results in increased self-priming, thereby generating more nonsense sequences and absorption of reagents that would otherwise be used for analyte dependent synthesis. Lower concentrations can theoretically be used by decreasing the complexity (i.e. sequence length) of the primers, but restraints are then imposed upon the stability of hybrid formation. On the other hand, the discrete sub-set of 7-mers and 8-mers described above requires knowledge of the complete genome of the intended target organism. As such, these will only be used with completely sequenced organisms, and a unique set has to be individually developed for each target organism thus limiting its application. Consensus sequences can be enzymatically added by RNA ligation or poly A polymerase but both of these are slow inefficient processes. Thus there exists a need for methods and compositions that can efficiently provide stable priming of a large number of non-polyadenylated templates of variable or even unknown sequence while maintaining a low level of complexity.
Methods have also been described for the introduction of sequences into analytes for the purpose of amplification. For instance, oligonucleotides have also been described that comprise a segment complementary to a target sequence and a segment comprising a promoter sequence where the target is either a selected discrete sequence or a natural poly A sequence (U.S. Pat. Nos. 5,554,516 and 6,338,954). After hybridization to a target mRNA, RNAse H is used to cleave a segment of the analyte hybridized to the complementary segment and then extend the 3′ end of the analyte using the promoter segment as a template. Since the oligonucleotide that is used for these methods has a homogeneous nature, this particular method relies upon the extension reaction being initiated before the endonuclease reaction completes digestion of the complementary segment of the analyte.