Carbon fixed during photosynthesis is either retained in the chloroplast and converted to a storage carbohydrate, for example, starch, or it is transferred to the cytosol in the form of triose phosphates and converted to sucrose. The newly synthesized sucrose in source tissues is a major transported form of reduced carbon in higher plants and can be either metabolized into other carbohydrates, stored in the vacuole or exported to other plant tissues. Plant tissues where sucrose is synthesized, such as leaves, are often referred to as ‘source’ tissues. Translocated sucrose is retained in ‘sink’ tissues (such as expanding leaves, growing seeds, flowers, roots or tubers, and fruit) and may be assimilated, or further metabolized to sustain cell maintenance or fuel growth, or be converted to alternative storage compounds (e.g., starch, fats). The relative type and size of these carbohydrate pools vary during tissue development, between different plant species, and within the same species subject to different environmental conditions. Such differences are reported to affect the yield and quality of agricultural produce.
Sucrose synthesis and catabolism are reported to be highly coordinated and regulated processes that may also be coordinately regulated with other dedicated metabolic pathways in a particular plant, plant organ or cell type. Sucrose synthesis is reported to be coordinately regulated with starch metabolism and photosynthesis in green ‘source’ plant tissues. Sucrose supply by transport mechanisms to actively growing ‘sink’ tissues is reported to be coordinated with plant development. In growing sink tissues, the supply of carbohydrate is reported to be important to other metabolic pathways and physiological processes including respiration, starch biosynthesis, cell wall biogenesis, lipid and protein biosynthesis. Sucrose synthesis and/or transport is also reported to play a role in the carbohydrate capacity that is available to growing fruits and seeds. Sucrose resynthesis during seed germination is reported to play a role in seedling vigor and agronomic stand establishment in many plant species during early plant development.
In many plant species, enzymes of pathways involved in sucrose metabolism can play a role in plant physiology and plant growth and development. Compartmentation and temporal regulation of genes and enzymes of sucrose metabolic pathways can allow multiple pathways to utilize sucrose as a common metabolite. Flux through a particular sucrose metabolic pathway can define the utilization of sucrose in any tissue or developmental stage. Sucrose and its metabolite products have been reported to play a role in gene regulation and expression of the sucrose pathway and other metabolic pathways in plants.
Reviews on sucrose metabolism in plants include Avigad, In: Encyclopedia of Plant Physiology, Vol 13A, Loewus and Tanner, eds., Springer Verlag, Heidelberg, 217-347 (1982); Hawker, In: Biochemistry of Storage Carbohydrates in Green Plants, Dey and Dixon, eds., Academic Press, London, 1-51 (1985); Huber et al., In: Carbon Partitioning Within and Between Organisms, Pollock et al., eds., Bios Scientific, Oxford, 1-26 (1992); Stitt et al., In: Biochemistry of Plants, Vol 10, Hatch and Boardman, eds., Academic Press, New York, 327-407 (1987); Quick and Schaffer, In: Photoassimilate Distribution In: Plants And Crops, Zamski and Schaffer, eds., Marcel Dekker Inc., New York, 115-156 (1996), all of which are herein incorporated by reference in their entirety.
The synthesis of sucrose precursors (triose and hexose phosphates) is derived from either photosynthetic CO2 fixation or degradation of previously deposited storage reserves. One substrate for sucrose synthesis in photosynthetic tissues is three carbon sugar phosphates. These are exported from the chloroplast during photosynthesis, predominantly in the form of triose phosphates. The pool of triose phosphates, dihydroxyacetone phosphate (“DHAP”), and glyceraldehyde-3-phosphate (“GAP”), is maintained at equilibrium within the cytoplasm by triose phosphate isomerase (EC 5.3.1.1). A subsequent reaction involves an aldol condensation of DHAP and GAP, catalyzed by the enzyme fructose 1,6-bisphosphate aldolase (often called aldolase) (EC 4.1.2.13) to form fructose 1,6-bisphosphate (“F1,6BP”). Fructose-1,6-bisphosphatase (“FBPase”) (EC 3.1.3.11) catalyzes the cleavage of phosphate from the C1 carbon of fructose-1,6-bisphosphate to form fructose-6-phosphate (“F6P”). This reaction is essentially irreversible and has been reported to represent the first committed step within the pathway of sucrose synthesis. The cytosolic FBPase has been reported to be subject to allosteric regulation and may serve to coordinate the rate of sucrose synthesis with that of photosynthesis. Fructose 2,6-bisphosphate (“F2,6BP”) is reported to be a regulator of FBPase (Black et al., In: Regulation of Carbohydrate Partitioning In Photosynthetic Tissue, Heath and Preiss, eds., Waverly, Baltimore, 109-126 (1985); Stitt et al., In: Biochemistry Of Plants, Vol. 10, Hatch and Boardman, eds., Academic Press, New York, 327-407 (1987), both of which are herein incorporated by reference in their entirety). The concentration of F2,6BP is reported to be controlled in plants by two enzymes, fructose-2,6-bisphosphatase (F2,6Bpase) (EC 3.1.3.46) and fructose-6-phosphate,2-kinase (F6P,2K) (EC 2.7.1.105) (Stitt, Annu. Rev. Plant Physiol. Plant Mol. Biol. 41: 153-181 (1990), the entirety of which is herein incorporated by reference).
Glucose-6-phosphate (“G6P”) and glucose-1-phosphate (“G1P”) are reported to be maintained in equilibrium with the F6P pool by the action of phosphoglucoisomerase (“PGI”) (EC 5.3.1.9) and phosphoglucomutase (“PGM”) (EC 5.4.2.2), respectively. Uridine diphosphate glucose (“UDPG”) and pyrophosphate (“PPi”) are formed from uridine triphosphate (“UTP”) and G1P catalyzed by the enzyme UDPG-pyrophosphorylase (“UDPGase”) (EC 2.7.7.9). This reaction is reversible and net flux in the direction of sucrose synthesis is reported to require removal of its products, particularly PPi. A pyrophosphate-dependent proton pump, vacuolar H+-translocating-pyrophosphatase (EC 3.6.1.1), has been identified within the vacuolar membrane and has been reported to utilize pyrophosphate to sustain a proton gradient formed between these two compartments (Rea et al., Trends in Biol. Sci. 17: 348-353 (1992), the entirety of which is herein incorporated by reference).
A pyrophosphate-dependent fructose-6-phosphate phosphotransferase (“PFP”) (EC 2.7.1.90) is also present in the cytoplasm and catalyzes the reversible production of F1,6BP and Pi from F6P and PPi. One reported function of PFP is to operate in a futile cycle with the cytosolic FBPase, and function as a “pseudopyrophosphatase” recycling PPi. Uridine diphosphate glucose is then combined with F6P to form sucrose-6-phosphate (“S6P”). This reaction is catalyzed by sucrose phosphate synthase (“SPS”) (EC 2.4.1.14). Attachment of UDP to the glucose moiety activates the C1 carbon atom of UDPG, which is necessary for the subsequent formation of a glycosidic bond in sucrose. In certain organisms, SPS is capable of using adenine diphosphate glucose (“ADPG”), instead of UDPG, as a substrate. The use of nucleotide biphosphate sugars is a feature of metabolic pathways leading to the production of disaccharides and polysaccharides. SPS is reported to be subject to allosteric and covalent regulation and, in conjunction with the cytosolic FBPase, reportedly serves to coordinate the rate of sucrose synthesis with the rate of photosynthesis. The reported final reaction in the pathway is catalyzed by sucrose-6-phosphate phosphatase (“SPPase” or “SPP”) (EC 3.1.3.24), which catalyzes the hydrolysis of S6P to sucrose. It has been reported that SPS and SPPase may associate to form a multienzyme complex, that the rate of sucrose-6-phosphate synthesis by SPS is enhanced in the presence of SPP, and that the rate of sucrose-6-phosphate hydrolysis by SPP is increased in the presence of SPS (Echeverria et al., Plant Physiol. 115: 223-227 (1997), herein incorporated by reference in its entirety).
I. Sucrose Synthesis
Reviews describing fructose-1,6-bisphosphatase (“FBPase”, EC 3.1.3.11) include those by Hers and Van Shaftingen, Biochem J. 206:1-12 (1982), the entirety of which is herein incorporated by reference, and Stitt, Annu. Rev. Plant Physiol. Plant Mol. Biol. 41:153-181 (1990). Two isoforms of FBPase are reported to exist in plants. The first isoform is associated with the plastid and occurs largely in photosynthetic plastids. The second isoform, located in the cytoplasm, is reported to be involved in both gluconeogenesis and sucrose synthesis (Zimmerman et al., J. Biol. Chem. 253: 5952-5956 (1978); Stitt and Heldt, Planta 164: 179-188 (1985), both of which are hereby incorporated by reference in their entirety). FBPase catalyzes an irreversible reaction in the direction of F6P synthesis in vivo and has been reported to represent the first committed step in the pathway of sucrose synthesis. The properties of the enzyme are reported to involve the action of several regulatory metabolites (Stitt et al., In: Biochemistry Of Plants, Vol. 10, Hatch and Boardman, eds., Academic Press, New York, 327-407 (1987)). The enzyme reportedly has a high affinity for its substrate F1,6BP, a requirement for Mg2+, a requirement for a neutral pH, is weakly inhibited (Km 2-4 μm) by adenosine monophosphate (AMP), and is strongly inhibited by the regulatory metabolite F2,6BP (Hers and Van Shaftingen, Biochem J. 206: 1-12 (1982); Black et al., In: Regulation of Carbohydrate Partitioning In Photosynthetic Tissue, Heath and Preiss, eds., Waverly, Baltimore, 109-126 (1985); Huber, Annu. Rev. Plant Physiol. 37: 233-246 (1986); Stitt et al., In: Biochemistry Of Plants, Vol. 10, Hatch and Boardman, eds., Academic Press, New York, 327-407 (1987), all of which are herein incorporated by reference in their entirety). F2,6BP is also an activator of PFP and reportedly plays a role in the regulation of gluconeogenetic and respiratory metabolism.
The concentration of F2,6BP is reportedly determined in plants by two enzymes, fructose-2,6-bisphosphatase (“F2,6BPase”) (EC 3.1.3.46) and fructose-6-phosphate,2-kinase (“F6P,2K”) (EC 2.7.1.105). A review of these enzymes is provided by Stitt, Annu. Rev. Plant Physiol. Plant Mol. Biol. 41: 153-181 (1990). Regulation of the activity of the F1,6FBPase and the rate of sucrose synthesis is reported to be, at least in part, brought about by changes in the concentration of F2,6BP.
Sucrose phosphate synthase (SPS (EC 2.4.1.14)) catalyzes a reaction that is displaced from equilibrium in vivo in the direction of S6P synthesis and is reported as an essentially irreversible reaction in vivo (Stitt et al., In: Biochemistry Of Plants, Vol. 10, Hatch and Boardman, eds., Academic Press, New York, 327-407 (1987); Lunn and Rees, Biochem. J. 267: 739-743 (1990), the entirety of which is herein incorporated by reference; U.S. Pat. No. 5,665,892, the entirety of which is herein incorporated by reference). SPS has been purified from spinach and Zea mays, and the amino acid and cDNA sequences have been published (Worrel et al., Plant Cell 3:1121-1130 (1991); Klein et al., Planta 190: 498-510 (1993); Sonnewald et al., Planta 189: 174-181 (1993), all of which are herein incorporated by reference in their entirety). The enzyme has a subunit molecular weight of 117 kDa from spinach (Klein et al., Planta 190: 498-510 (1993); Sonnewald et al., Planta 189: 174-181 (1993), both of which are herein incorporated by reference) and pea (Lunn and Rees, Phytochem. 29: 1057-1063 (1990), the entirety of which is herein incorporated by reference) and 135 kDa from Zea mays (Worrel et al., Plant Cell 3:1121-1130 (1991)). The native enzyme reportedly exists as a tetramer (Walker and Huber, Plant Physiol. 89: 518-524 (1988); Lunn and Rees, Phytochem. 29: 1057-1063 (1990); Worrel et al., Plant Cell 3:1121-1130 (1991), although dimeric molecular weights have been reported (Klein et al., Planta 190: 498-510 (1993), the entirety of which is herein incorporated by reference). Activity has been observed for SPS at both dimeric and tetrameric molecular weights (Sonnewald et al., Planta 189:174-181 (1993), the entirety of which is herein incorporated by reference).
SPS is located in the cytosol, has a neutral pH optimum, and has been detected in all plant tissues which undertake active sucrose synthesis. SPS is also reported to undertake active sucrose synthesis. An increase in abundance of the enzyme is has been reported during the development of leaves, germination of seeds and ripening of fruit. The enzyme has been reported to be subject to regulation by metabolites and is activated by G6P and is inhibited by Pi. Pi and GP6 are reported to act competitively at an allosteric site of the enzyme. In the presence of high Pi concentrations, the enzyme is phosphorylated which reduces activity of the enzyme. It has also been reported that light-induced photosynthesis increases the activity of SPS in crude extracts (Sicher and Kremer, Plant Physiol. 79: 910-912 (1984), Sicher and Kremer, Plant Physiol. 79: 695-698 (1985); Pollock and Housley, Ann. Bot. 55: 593-596 (1985), all of which are herein incorporated by reference in their entirety). In addition, it has been reported that compounds altering the phosphate status of the leaf can simulate the effects of light. Feeding leaves mannose, which sequesters phosphate by its conversion to the non-metabolized mannose-6-P, has been reported to cause activation of SPS (Stitt et al., Planta 174: 217-230 (1988), the entirety of which is herein incorporated by reference).
The phosphorylation and dephosphorylation of SPS is catalyzed by SPS-phosphatase and SPS-kinase, respectively (Huber et al., Plant Physiol. 99: 1275-1278 (1992). Hydrolysis of sucrose-6-P to sucrose is catalyzed by sucrose-6-phosphatase (SPPase or SPP) (EC 3.1.3.24). The activity of both SPS and SPP is reported to be affected by a multienzyme complex between SPS and SPP (Echeverria et al., Plant Physiol. 115: 223-227 (1997)).
Regulatory properties of SPS and FBPase are reported to coordinate the rate of sucrose synthesis with that of photosynthesis (Stitt, In: Plant Physiology, Biochemistry and Molecular Biology, Dennis and Turpin, eds., Singapore, London, 319-340 (1990), the entirety of which is herein incorporated by reference). When photosynthesis produces triose phosphate in excess of the rate of sucrose synthesis, a feed-forward activation of sucrose synthesis occurs. Triose phosphate crosses the chloroplast membrane in exchange for cytosolic Pi. Under these conditions, F6P,2-kinase activity is reduced and the inhibition of F2,6Bpase is decreased.
As cytosolic F2,6BP falls, F2,6BPase activity increases, and F6P levels increase. Hexose phosphate levels are reported to increase due to PGM and PGI, and with low Pi, activate SPS and F1,6BPase. Reduction in rate of photosynthesis must result in a deactivation of sucrose synthesis, which occurs through decreased cytosolic triose-P, increased Pi and ultimately increased F2,6BP concentration and reduced SPS activity (Stitt, Phil. Trans. R Soc. Lond. B 342: 225-233 (1993); Huber et al., Plant Physiol. 99: 1275-1278 (1992); Neuhaus et al., Planta 181: 583-592 (1990), both of which are herein incorporated by reference).
II. Metabolic Pathways of Sucrose Catabolism
Sucrose can initially be cleaved by invertases (EC 3.2.1.26) or by sucrose synthases (EC 2.4.1.13). Invertases, which are classified as acid or alkaline in pH preference (Karuppiah et al., Plant Physiol. 91: 993-998 (1989); Fahrendorf and Beck, Planta 180: 237-244 (1990); Iwatsubo et al., Biosc. Biotech. Biochem. 56: 1959-1962 (1992); Unger et al., Plant Physiol. 104: 1351-1357 (1994); Avigad, In: Encyclopedia of Plant Physiology, Vol 13A, Loewus and Tanner, eds., Springer Verlag, Heidelberg, 217-347 (1982), all of which are herein incorporated by reference in their entirety), irreversibly cleave sucrose into glucose and fructose, both of which is usually phosphorylated for further metabolism. The invertase pathway usually is associated with rapidly growing sink tissues such as expanding leaves, expanding internodes, flower petals, and early fruit development (Avigad, In: Encyclopedia of Plant Physiology, Vol 13A, Loewus and Tanner, eds., Springer Verlag, Heidelberg, 217-347 (1982); Huber, Plant Physiol. 91: 656-662 (1989); Morris and Arthur, Phytochem. 23: 2163-2167 (1984); Hawker et al., Phytochem. 15: 1441-1443 (1976); Schaffer et al., Plant Physiol. 69: 151-155 (1987), all of which are herein incorporated by reference in their entirety).
Sucrose synthase carries out the kinetically reversible transglycosylation of sucrose and UDP into fructose and UDPG, requiring only the phosphorylation of fructose for additional metabolism. Polysaccharide biosynthesis in sink tissues may utilize a sucrose synthase mediated sucrose catabolism (Avigad, In: Encyclopedia of Plant Physiology, Vol 13A, Loewus and Tanner, eds., Springer Verlag, Heidelberg, 217-347 (1982); Doehlert et al., Plant Physiol. 86: 1013-1019 (1988); Dale and Housley Plant Physiol. 82: 7-10 (1986), all of which are herein incorporated by reference). Respiring tissues reportedly utilize either sucrose synthase or invertase metabolic pathways (Echeverria and Humphreys, Phytochem. 23: 2173-2178 (1984); Uritani and Asahi, In: The Biochemistry of Plants Vol. 2, Davies, ed., Academic Press, New York, 463-487 (1980), all of which are herein incorporated by reference in their entirety). Tissues that are undergoing respiration, starch biosynthesis, amino acid and fatty acid synthesis, rapid expansion or growth, and other cellular metabolism, can utilize several sucrose metabolic pathways which may be temporally or compartmentally regulated (Doehlert et al., Plant Physiol. 86: 1013-1019 (1988); Doehlert, Plant Physiol. 78: 560-567 (1990); Doehlert and Choury, In: Recent Advances in Phloem Transport and Assimilate Compartmentation, Bonnemain et al., eds., Ouest editions, Nantes, France, 187-195 (1991); Delmer and Stone, In: The Biochemistry of Plants, Vol. 14, Preiss, ed., Academic Press, San Diego, 373-420 (1988); Maas et al., EMBO J. 9: 3447-3452 (1990), all of which are herein incorporated by reference in their entirety).
Hexose kinases are a class of enzymes responsible for the phosphorylation of hexoses, and are classified into two groups. Hexokinase (EC 2.7.1.1) can phosphorylate either glucose or fructose, with different isoforms often unique to different tissues or plant species. Different isoforms can have affinities for different hexoses (Turner and Copeland, Plant Physiol. 68: 1123-1127 (1981), the entirety of which is herein incorporated by reference; Copeland and Turner, In: The Biochemistry of Plants, Vol. 11, Stumpf and Conn, eds., Academic Press, New York, 107-128 (1987), the entirety of which is herein incorporated by reference). Hexokinases include fructokinases (EC 2.7.1.11), which typically have specific affinities for fructose (Doehlert, Plant Physiol. 89: 1042-1048 (1989); Renz and Stitt Planta 190: 166-175 (1993), both of which are herein incorporated by reference). Fructokinases can also be specific in their affinity for nucleotides. The extent to which a fructokinase utilizes UTP may play a physiological role in how efficiently UDP can be recycled for sucrose synthase activity in a particular tissue (Huber and Akazawa, Plant Physiol. 81: 1008-1013 (1986); Xu et al., Plant Physiol. 90: 635-642 (1989), both of which are herein incorporated by reference). UDP levels for the sucrose synthase reaction may be maintained, even in the case of an ATP-specific fructokinase, by the enzyme NDP-kinase (EC 2.7.4.6).
NDP-kinase has been reported in several plant tissues (Kirkland and Turner, J. Biochem. 72: 716-720 (1959); Bryce and Nelson, Plant Physiol. 63: 312-317 (1979); Dancer et al., Plant Physiol. 92: 637-641 (1990); Yano et al., Plant Molec. Biol. 23: 1087-1090 (1993), all of which are herein incorporated by reference in their entirety). Fructokinase can be substrate inhibited by fructose. In addition, sucrose synthase can be inhibited by fructose (Doehlert, Plant Sci. 52: 153-157 (1987); Morell and Copeland, Plant Physiol. 78: 140-154 (1985), Ross and Davies, Plant Physiol. 100: 1008-1013 (1992), all of which are herein incorporated by reference in their entirety). Whereas plant tissues where sucrose is catabolized by sucrose synthase predominantly contain fructokinases (Xu et al., Plant Physiol. 90: 635-642 (1989); Kursanov et al., Soviet Plant Physiol. 37: 507-515 (1990); Ross et al., Plant Physiol. 90: 748-756 (1994)), plant tissues where sucrose is catabolized by invertase often contain hexokinases (Nakamura et al., Plant Physiol. 81: 215-220 (1991)). Tissues which have both invertase and sucrose synthase activity may contain both hexose kinases (Nakamura et al., Plant Physiol. 81: 215-220 (1991), the entirety of which is herein incorporated by reference). F6P resulting from hexose kinase activity can be further metabolized in glycolysis or used in resynthesis of sucrose by SPS. G6P resulting from hexose kinase activity can enter the pentose phosphate pathway, via G6P dehydrogenase (EC 1.1.1.49), or be converted to F6P by phosphoglucoisomerase (“PGI”) (EC 5.3.1.9) or G1P by phosphoglucomutase (“PGM”) (EC 5.4.2.2) (Rees, In: Encyclopedia of Plant Physiology Vol 18, Douce and Day, eds., Springer Verlag, Berlin, 391-417 (1985); Copeland and Turner, In: The Biochemistry of Plants Vol. 11, Stumpf and Conn, eds., Academic Press, New York, 107-128 (1987); Foster and Smith, Planta 180: 237-244 (1993), all of which are herein incorporated by reference in their entirety).
PGI and PGM are reported to be ubiquitous and reversible with commitments of G6P to either F6P or G1P resulting from fluxes in metabolites further along each pathway, i.e., depending on the cell needs for glycolysis (F6P) or starch biosynthesis (G1P) (Edwards and Rees, Phytochem. 25: 2033-2039 (1986); Kursanov et al., Soviet Plant Physiol. 37: 507-515 (1990); Tobias et al., Plant Physiol. 99: 140-145 (1992), all of which are herein incorporated by reference in their entirety). UDPG formed by sucrose synthase may be utilized directly for cellulose or callose biosynthesis via UDP-glucose dehydrogenase (EC 1.1.1.2) (Robertson et al., Phytochem. 39: 21-28 (1995), the entirety of which is herein incorporated by reference), can be used for sucrose synthesis by SPS or sucrose synthase, or for glycolysis or starch metabolism dependent on further metabolism by UDP-glucose pyrophosphorylase (EC 2.7.7.9). UDP-glucose phosphorylase has been reported to be a largely reversible enzyme (Kleczkowski, Phytochem. 37: 1507-1515 (1994), the entirety of which is herein incorporated by reference). Flux through UDP-glucose pyrophosphorylase is reported to be influenced by metabolite levels and utilization of reaction products further along in the pathways (Doehlert et al., Plant Physiol. 86: 1013-1019 (1988); Huber and Akazawa, Plant Physiol. 81: 1008-1013 (1986); Zrenner et al., Planta 190: 247-252 (1993), all of which are herein incorporated by reference in their entirety). The reversibility of PGI, PGM and UDPGPPase has been reported to provide for metabolic variability and networking in metabolism, independent of which initial enzyme cleaved sucrose.
The fate of F6P reportedly plays a role in carbohydrate metabolism. NTP-phosphofructokinase (PFK) (EC 2.7.1.11) (Copeland and Turner, In: The Biochemistry of Plants Vol. 11, Stumpf and Conn, eds., Academic Press, New York, 107-128 (1987); Dennis and Greyson, Plant Physiol. 69: 395-404 (1987); Rees, In: The Biochemistry of Plants Vol. 14, Preiss, ed., Academic Press, San Diego, 1-33 (1988), all of which are herein incorporated by reference in their entirety) is reported to irreversibly convert F6P to F16BP and is associated with glycolysis. The reverse reaction of F16BP to F6P, associated with gluconeogenesis, is essentially irreversible, and is catalyzed by FBPase (EC 3.1.3.11) (Black et al., Plant Physiol. 69: 387-394 (1987). Both reactions may be carried out in a reversible manner by a PPi-dependent fructose-6-phosphate phosphotransferase or PPi-phosphofructokinase (PFP; EC 2.7.1.90) (Black et al., Plant Physiol. 69: 387-394 (1987).
PPi-dependent fructose-6-phosphate phosphotransferase or PPi-phosphofructokinase is reported to play a role in the generation of biosynthetic intermediates (Dennis and Greyson, Plant Physiol. 69: 395-404 (1987); Tobias et al., Plant Physiol. 99: 146-152 (1992), the entirety of which is herein incorporated by reference) in addition to the cycling of PPi for UDPGPPase and ultimately UDP for sucrose synthase (Huber and Akazawa, Plant Physiol. 81: 1008-1013 (1986); Black et al., Plant Physiol. 69: 387-394 (1987); Rees, In: The Biochemistry of Plants Vol. 14, Preiss, ed., Academic Press, San Diego, 1-33 (1988), all of which are herein incorporated by reference in their entirety).
II. Expressed Sequence TAG Nucleic Acid Molecules
Expressed sequence tags, or ESTs are randomly sequenced members of a cDNA library (or complementary DNA)(McCombie et al., Nature Genetics 1:124-130 (1992); Kurata et al., Nature Genetics 8:365-372 (1994); Okubo et al., Nature Genetics 2:173-179 (1992), all of which references are incorporated herein in their entirety). The randomly selected clones comprise insets that can represent a copy of up to the full length of a mRNA transcript.
Using conventional methodologies, cDNA libraries can be constructed from the mRNA (messenger RNA) of a given tissue or organism using poly dT primers and reverse transcriptase (Efstratiadis et al., Cell 7:279-3680 (1976), the entirety of which is herein incorporated by reference; Higuchi et al., Proc. Natl. Acad. Sci. (U.S.A.) 73:3146-3150 (1976), the entirety of which is herein incorporated by reference; Maniatis et al., Cell 8:163-182 (1976) the entirety of which is herein incorporated by reference; Land et al., Nucleic Acids Res. 9:2251-2266 (1981), the entirety of which is herein incorporated by reference; Okayama et al., Mol. Cell. Biol. 2:161-170 (1982), the entirety of which is herein incorporated by reference; Gubler et al., Gene 25:263-269 (1983), the entirety of which is herein incorporated by reference).
Several methods may be employed to obtain full-length cDNA constructs. For example, terminal transferase can be used to add homopolymeric tails of dC residues to the free 3′ hydroxyl groups (Land et al., Nucleic Acids Res. 9:2251-2266 (1981), the entirety of which is herein incorporated by reference). This tail can then be hybridized by a poly dG oligo which can act as a primer for the synthesis of full length second strand cDNA. Okayama and Berg, Mol. Cell. Biol. 2:161-170 (1982), the entirety of which is herein incorporated by reference, report a method for obtaining full length cDNA constructs. This method has been simplified by using synthetic primer-adapters that have both homopolymeric tails for priming the synthesis of the first and second strands and restriction sites for cloning into plasmids (Coleclough et al., Gene 34:305-314 (1985), the entirety of which is herein incorporated by reference) and bacteriophage vectors (Krawinkel et al., Nucleic Acids Res. 14:1913 (1986), the entirety of which is herein incorporated by reference; Han et al., Nucleic Acids Res. 15:6304 (1987), the entirety of which is herein incorporated by reference).
These strategies have been coupled with additional strategies for isolating rare mRNA populations. For example, a typical mammalian cell contains between 10,000 and 30,000 different mRNA sequences (Davidson, Gene Activity in Early Development, 2nd ed., Academic Press, New York (1976), the entirety of which is herein incorporated by reference). The number of clones required to achieve a given probability that a low-abundance mRNA will be present in a cDNA library is N=(1n(1−P))/(1n(1−1/n)) where N is the number of clones required, P is the probability desired and 1/n is the fractional proportion of the total mRNA that is represented by a single rare mRNA (Sambrook et al., Molecular Cloning: A Laboratory Manual, 2nd ed., Cold Spring Harbor Laboratory Press (1989), the entirety of which is herein incorporated by reference).
A method to enrich preparations of mRNA for sequences of interest is to fractionate by size. One such method is to fractionate by electrophoresis through an agarose gel (Pennica et al., Nature 301:214-221 (1983), the entirety of which is herein incorporated by reference). Another such method employs sucrose gradient centrifugation in the presence of an agent, such as methylmercuric hydroxide, that denatures secondary structure in RNA (Schweinfest et al., Proc. Natl. Acad. Sci. (U.S.A.) 79:4997-5000 (1982), the entirety of which is herein incorporated by reference).
A frequently adopted method is to construct equalized or normalized cDNA libraries (Ko, Nucleic Acids Res. 18:5705-5711 (1990), the entirety of which is herein incorporated by reference; Patanjali et al., Proc. Natl. Acad. Sci. (U.S.A.) 88:1943-1947 (1991), the entirety of which is herein incorporated by reference). Typically, the cDNA population is normalized by subtractive hybridization (Schmid et al., J. Neurochem. 48:307-312 (1987), the entirety of which is herein incorporated by reference; Fargnoli et al., Anal. Biochem. 187:364-373 (1990), the entirety of which is herein incorporated by reference; Travis et al., Proc. Natl. Acad. Sci. (U.S.A.) 85:1696-1700 (1988), the entirety of which is herein incorporated by reference; Kato, Eur. J. Neurosci. 2:704-711 (1990); and Schweinfest et al., Genet. Anal. Tech. Appl. 7:64-70 (1990), the entirety of which is herein incorporated by reference). Subtraction represents another method for reducing the population of certain sequences in the cDNA library (Swaroop et al., Nucleic Acids Res. 19:1954 (1991), the entirety of which is herein incorporated by reference).
ESTs can be sequenced by a number of methods. Two basic methods may be used for DNA sequencing, the chain termination method of Sanger et al., Proc. Natl. Acad. Sci. (U.S.A.) 74:5463-5467 (1977), the entirety of which is herein incorporated by reference and the chemical degradation method of Maxam and Gilbert, Proc. Nat. Acad. Sci. (U.S.A.) 74:560-564 (1977), the entirety of which is herein incorporated by reference. Automation and advances in technology such as the replacement of radioisotopes with fluorescence-based sequencing have reduced the effort required to sequence DNA (Craxton, Methods 2:20-26 (1991), the entirety of which is herein incorporated by reference; Ju et al., Proc. Natl. Acad. Sci. (U.S.A.) 92:4347-4351 (1995), the entirety of which is herein incorporated by reference; Tabor and Richardson, Proc. Natl. Acad. Sci. (U.S.A.) 92:6339-6343 (1995), the entirety of which is herein incorporated by reference). Automated sequencers are available from, for example, Pharmacia Biotech, Inc., Piscataway, N.J. (Pharmacia ALF), LI-COR, Inc., Lincoln, Neb. (LI-COR 4,000) and Millipore, Bedford, Mass. (Millipore BaseStation).
In addition, advances in capillary gel electrophoresis have also reduced the effort required to sequence DNA and such advances provide a rapid high resolution approach for sequencing DNA samples (Swerdlow and Gesteland, Nucleic Acids Res. 18:1415-1419 (1990); Smith, Nature 349:812-813 (1991); Luckey et al., Methods Enzymol. 218:154-172 (1993); Lu et al., J. Chromatog. A. 680:497-501 (1994); Carson et al., Anal. Chem. 65:3219-3226 (1993); Huang et al., Anal. Chem. 64:2149-2154 (1992); Kheterpal et al., Electrophoresis 17:1852-1859 (1996); Quesada and Zhang, Electrophoresis 17:1841-1851 (1996); Baba, Yakugaku Zasshi 117:265-281 (1997), all of which are herein incorporated by reference in their entirety).
ESTs longer than 150 nucleotides have been found to be useful for similarity searches and mapping (Adams et al., Science 252:1651-1656 (1991), herein incorporated by reference). ESTs, which can represent copies of up to the full length transcript, may be partially or completely sequenced. Between 150-450 nucleotides of sequence information is usually generated as this is the length of sequence information that is routinely and reliably produced using single run sequence data. Typically, only single run sequence data is obtained from the cDNA library (Adams et al., Science 252:1651-1656 (1991). Automated single run sequencing typically results in an approximately 2-3% error or base ambiguity rate (Boguski et al., Nature Genetics 4:332-333 (1993), the entirety of which is herein incorporated by reference).
EST databases have been constructed or partially constructed from, for example, C. elegans (McCombrie et al., Nature Genetics 1:124-131 (1992)), human liver cell line HepG2 (Okubo et al., Nature Genetics 2:173-179 (1992)), human brain RNA (Adams et al., Science 252:1651-1656 (1991); Adams et al., Nature 355:632-635 (1992)), Arabidopsis, (Newman et al., Plant Physiol. 106:1241-1255 (1994)); and rice (Kurata et al., Nature Genetics 8:365-372 (1994)).
III. Sequence Comparisons
A characteristic feature of a DNA sequence is that it can be compared with other DNA sequences. Sequence comparisons can be undertaken by determining the similarity of the test or query sequence with sequences in publicly available or proprietary databases (“similarity analysis”) or by searching for certain motifs (“intrinsic sequence analysis”)(e.g. cis elements)(Coulson, Trends in Biotechnology 12:76-80 (1994), the entirety of which is herein incorporated by reference); Birren et al., Genome Analysis 1: Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. 543-559 (1997), the entirety of which is herein incorporated by reference).
Similarity analysis includes database search and alignment. Examples of public databases include the DNA Database of Japan (DDBJ) (available on the worldwide web at ddbj.nig.ac.jp); Genebank (available on the worldwide web at the ncbi website at: /Web/Search/Index.html); and the European Molecular Biology Laboratory Nucleic Acid Sequence Database (EMBL) (available on the worldwide web at ebi.ac.uk/ebi_docs/embl_db/embl-db.html). Other appropriate databases include dbEST (available on the worldwide web at the ncbi website at:/dbEST /index.html), SwisProt (available on the worldwide web at ebi.ac.uk/ebi_docs/swisprot_db/swisshome.html), PIR (available on the worldwide web at nbrt.georgetown.edu/pir), and The Institute for Genome Research (available on the worldwide web at tigr.org/tdb/tdb.html).
A number of different search algorithms have been developed, one example of which are the suite of programs referred to as BLAST programs. There are five implementations of BLAST, three designed for nucleotide sequences queries (BLASTN, BLASTX and TBLASTX) and two designed for protein sequence queries (BLASTP and TBLASTN) (Coulson, Trends in Biotechnology 12:76-80 (1994); Birren et al, Genome Analysis 1, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. 543-559 (1997)).
BLASTN takes a nucleotide sequence (the query sequence) and its reverse complement and searches them against a nucleotide sequence database. BLASTN was designed for speed, not maximum sensitivity and may not find distantly related coding sequences. BLASTX takes a nucleotide sequence, translates it in three forward reading frames and three reverse complement reading frames and then compares the six translations against a protein sequence database. BLASTX is useful for sensitive analysis of preliminary (single-pass) sequence data and is tolerant of sequencing errors (Gish and States, Nature Genetics 3:266-272 (1993), the entirety of which is herein incorporated by reference). BLASTN and BLASTX may be used in concert for analyzing EST data (Coulson, Trends in Biotechnology 12:76-80 (1994); Birren et al., Genome Analysis 1:543-559 (1997)).
Given a coding nucleotide sequence and the protein it encodes, it is often preferable to use the protein as the query sequence to search a database because of the greatly increased sensitivity to detect more subtle relationships. This is due to the larger alphabet of proteins (20 amino acids) compared with the alphabet of nucleic acid sequences (4 bases), where it is far easier to obtain a match by chance. In addition, with nucleotide alignments, only a match (positive score) or a mismatch (negative score) is obtained, but with proteins, the presence of conservative amino acid substitutions can be taken into account. Here, a mismatch may yield a positive score if the non-identical residue has physical/chemical properties similar to the one it replaced. Various scoring matrices are used to supply the substitution scores of all possible amino acid pairs. A general purpose scoring system is the BLOSUM62 matrix (Henikoff and Henikoff, Proteins 17:49-61 (1993), the entirety of which is herein incorporated by reference), which is currently the default choice for BLAST programs. BLOSUM62 is tailored for alignments of moderately diverged sequences and thus may not yield the best results under all conditions. Altschul, J. Mol. Biol. 36:290-300 (1993), the entirety of which is herein incorporated by reference, describes a combination of three matrices to cover all contingencies. This may improve sensitivity, but at the expense of slower searches. In practice, a single BLOSUM62 matrix is often used but others (PAM40 and PAM250) may be attempted when additional analysis is necessary. Low PAM matrices are directed at detecting very strong but localized sequence similarities, whereas high PAM matrices are directed at detecting long but weak alignments between very distantly related sequences.
Homologues in other organisms are available that can be used for comparative sequence analysis. Multiple alignments are performed to study similarities and differences in a group of related sequences. CLUSTAL W is a multiple sequence alignment package that performs progressive multiple sequence alignments based on the method of Feng and Doolittle, J. Mol. Evol. 25:351-360 (1987), the entirety of which is herein incorporated by reference. Each pair of sequences is aligned and the distance between each pair is calculated; from this distance matrix, a guide tree is calculated and all of the sequences are progressively aligned based on this tree. A feature of the program is its sensitivity to the effect of gaps on the alignment; gap penalties are varied to encourage the insertion of gaps in probable loop regions instead of in the middle of structured regions. Users can specify gap penalties, choose between a number of scoring matrices, or supply their own scoring matrix for both pairwise alignments and multiple alignments. CLUSTAL W for UNIX and VMS systems is available by anonymous ftp at: ebi.ac.uk. Another program is MACAW (Schuler et al., Proteins Struct. Func. Genet. 9:180-190 (1991), the entirety of which is herein incorporated by reference, for which both Macintosh and Microsoft Windows versions are available. MACAW uses a graphical interface, provides a choice of several alignment algorithms and is available by anonymous ftp at the ncbi website at: nlm.nih.gov (directory/pub/macaw).
Sequence motifs are derived from multiple alignments and can be used to examine individual sequences or an entire database for subtle patterns. With motifs, it is sometimes possible to detect distant relationships that may not be demonstrable based on comparisons of primary sequences alone. Currently, the largest collection of sequence motifs in the world is PROSITE (Bairoch and Bucher, Nucleic Acid Research 22:3583-3589 (1994), the entirety of which is herein incorporated by reference). PROSITE may be accessed via either the ExPASy server on the World Wide Web or anonymous ftp site. Many commercial sequence analysis packages also provide search programs that use PROSITE data.
A resource for searching protein motifs is the BLOCKS E-mail server developed by Henikoff, Trends Biochem Sci. 18:267-268 (1993), the entirety of which is herein incorporated by reference; Henikoff and Henikoff, Nucleic Acid Research 19:6565-6572 (1991), the entirety of which is herein incorporated by reference; Henikoff and Henikoff, Proteins 17:49-61 (1993). BLOCKS searches a protein or nucleotide sequence against a database of protein motifs or “blocks.” Blocks are defined as short, ungapped multiple alignments that represent highly conserved protein patterns. The blocks themselves are derived from entries in PROSITE as well as other sources. Either a protein query or a nucleotide query can be submitted to the BLOCKS server; if a nucleotide sequence is submitted, the sequence is translated in all six reading frames and motifs are sought for these conceptual translations. Once the search is completed, the server will return a ranked list of significant matches, along with an alignment of the query sequence to the matched BLOCKS entries.
Conserved protein domains can be represented by two-dimensional matrices, which measure either the frequency or probability of the occurrences of each amino acid residue and deletions or insertions in each position of the domain. This type of model, when used to search against protein databases, is sensitive and usually yields more accurate results than simple motif searches. Two popular implementations of this approach are profile searches such as GCG program ProfileSearch and Hidden Markov Models (HMMs)(Krough et al., J. Mol. Biol. 235:1501-1531, (1994); Eddy, Current Opinion in Structural Biology 6:361-365, (1996), both of which are herein incorporated by reference in their entirety). In both cases, a large number of common protein domains have been converted into profiles, as present in the PROSITE library, or HHM models, as in the Pfam protein domain library (Sonnhammer et al., Proteins 28:405-420 (1997), the entirety of which is herein incorporated by reference). Pfam contains more than 500 HMM models for enzymes, transcription factors, signal transduction molecules and structural proteins. Protein databases can be queried with these profiles or HMM models, which will identify proteins containing the domain of interest. For example, HMMSW or HMMFS, two programs in a public domain package called HMMER (Sonnhammer et al., Proteins 28:405-420 (1997)) can be used.
PROSITE and BLOCKS represent collected families of protein motifs. Thus, searching these databases entails submitting a single sequence to determine whether or not that sequence is similar to the members of an established family. Programs working in the opposite direction compare a collection of sequences with individual entries in the protein databases. An example of such a program is the Motif Search Tool, or MoST (Tatusov et al., Proc. Natl. Acad. Sci. (U.S.A.) 91:12091-12095 (1994), the entirety of which is herein incorporated by reference). On the basis of an aligned set of input sequences, a weight matrix is calculated by using one of four methods (selected by the user). A weight matrix is simply a representation, position by position of how likely a particular amino acid will appear. The calculated weight matrix is then used to search the databases. To increase sensitivity, newly found sequences are added to the original data set, the weight matrix is recalculated and the search is performed again. This procedure continues until no new sequences are found.