This invention is in the field of bacterial gene expression. More specifically, this invention describes a method to monitor transcriptional changes on a genome-wide scale using a genome-registered gene fusion collection.
DNA array analysis is a powerful method for comprehensive genome analysis of gene expression. Currently, this approach is the only available method for massively parallel analyses that allow the expression of each gene of a bacterial genome to be characterized simultaneously (Richmond et al., (1999) Nucleic Acids Res. 27:3821-3835., 17, 25; Tao et al., (1999), J. Bacteriol. 181:6425-6440; Wilson et al., (1999) Proc. Natl. Acad. Sci. U.S.A. 96:12833-12838).
Richmond et al. ((1999) Nucleic Acids Research, 27:3821-3835) has recently reported genome-wide expression profiling of E. coli at the single ORF level of resolution. Changes in RNA levels after exposure to heat shock or IPTG were analyzed using comprehensive low density blots of individual ORFs on a nylon matrix and comprehensive high density arrays of individual ORFs spotted on glass slides. The results of the two methods were compared. Richmond et al. states that radioactive probe/spot blots are inferior to fluorescent probe/micro-arrays. Moreover, the comparison of heat shock treatment between the two methods is fundamentally flawed since the RNA analyzed with spot blots were derived from broth grown cultures while those analyzed with micro-arrays were derived from cells grown in defined media. Despite the power of this new methodology, there are several problems that limit the reliability of results. For example, artifacts may arise during the isolation of microbial RNA (Tao et al., (1999) J. Bacteriol. 181:6425-6440) or from cross hybridization to paralogous genes (Richmond et al., (1999) Nucleic Acids Res. 27:3821-3835, 17, 25).
Another limitation of DNA array methodology is that RNA must be isolated, converted into DNA by reverse transcriptase with concomitant incorporation of fluorescent labels. These steps make it unlikely that facile high throughput screens could be developed based on DNA array technology. Thus, there exists a need for a method that adapts results from DNA array technology into high throughput screens. For the reasons mentioned above and others, alternative genome-wide expression profiling method as well as rapid methods to independently verify results from DNA array experiments are needed.
Gene fusion technology is an established method for gene expression monitoring. For example, the initial discovery of the SOS (DNA damage responsive) regulon of E. coli was done by Kenyon and Walker ((1980) Proc. Natl. Acad. Sci. U.S.A. 77:2819-2823) by comparing the transcriptional responses of Escherichia coli to mitomycin C (MMC), a DNA damaging agent that intercalates into and forms a covalent attachment with double-stranded DNA. While these early experiments attempted to scan the bulk of the E. coli genome by using a transposon that put the lacZYA operon under the control of many promoter regions, it was not known if the entire genome had been surveyed because of the random nature of transposition and unknown location of the majority of transposition events. Accordingly, additional SOS regulon genes have been identified since these early experiments (Lomba et al., (1997) Microbiol Lett 156:119-122; Walker, (1996) In Escherichia coli and Salmonella: Cellular and Molecular Biology. ASM Press, pp 1400-1416).
LaRossa et al. (U.S. Pat. No. 5,683,868) has transformed E. coli with at construct comprised of luxCDABE operably linked to a variety of stress promoters. They have used the microorganisms to detect a variety of environmental insults such as Ethanol, CdCl2 and toluene. The presence of sublethal concentration of insults is indicated by an increase in bioluminescence. However, in order to generate the transformed host, the stress promoters has to be identified and characterized. Furthermore, this method is limited to the stress response only.
Ashby and Rine (U.S. Pat. No. 5,569,588) reported a method to measure the transcriptional responsiveness of an organism to a candidate drug by detecting reporter gene product signals from separately isolated cells of a target organism on genome-wide bases. Each cell contains a recombinant construct with the reporter gene operatively linked to a different endogenous transcriptional regulatory element of the target organism When cells were treated with a candidate drug, the transcriptional responsiveness of the organism to the candidate drug was measured by the detecting the reporter signal from each cells. However, this method is useful with the organism only after the majority of transcriptional regulatory elements of the target organism are known and mapped. Furthermore, the reporter signals are measured only after cells reached homeostasis in the presence of drug. The initial transcriptional responses to chemicals are not considered.
The Lux-A Collection of random E. coli genomic DNA fused to the luxCDABE had been used to screen for those gene fusions for which expression was induced by treatment with the herbicide sulfometuron methyl. The DNA sequence of 19 of these sulfometuron methyl inducible gene fusions (smi-lux) was determined and used to identify the promoter controlling expression of the luxCDABE reporter (Van Dyk et al., (1998) J. Bacteriol. 180:785-792); the remaining 8047 gene fusions remained unidentified.
LaRossa and Van Dyk (U.S. Pat. No. 6,025,131) developed a method for the identification of gene regulatory regions, responsive to a particular cellular stress, such as that produced by herbicides or crop protection chemicals by randomly fusing regulatory regions to a bacterial luminescent gene complex where contacting the fusion in a suitable host with a cellular insult producing a cellular stress results in detection of that cellular stress by an increase in cellular luminescence. However, this method was limited to the perturbations in liquid media; luminescent responses were not detected on solid medium following overnight growth in the presence of a chemical stress. Furthermore, it did not allow regulatory region activity analysis in genome-wide scale.
The problem to be solved therefore is to provide a way to measure and follow the changes in gene expression using a genome-registered collection of reporter gene fusions in a manner that allows detection of initial transcriptional responses, and provide a way to cross-validate the results from other method (i.e., microarray) as well as to determine promoter and operon structure of genes, and further provide a way to test cellular responses to various environmental and genetic changes in high throughput manner.
A new method for the use of genome registered collection of reporter gene fusion is disclosed. Fragments of genomic DNA of host organism were fused to promoterless reporter gene. The reporter gene fusions were generated using restriction enzyme digestion, physical shearing of the genomic DNA, PCR, and transposition techniques. The reporter gene complexes were genome registered against the host genome on the basis of homology. Gene expression of each reporter gene complex is measure as reporter gene activity. The present invention provides a means to measure the changes in gene expression profiles in genome wide scale under various conditions in high throughput manner. In addition to being a stand-alone high throughput method, the present invention also provides a way to validate other genome-wide assays such as DNA microarray. The present invention also provides a method to confirm the response of several promoters to a particular insult (a condition or chemical of interest) as well as to identify a number of previously unknown operons responsive to that insult. Comparison of the gene expression patterns of two samples differing in one variable is also possible using this method. The present invention also provides the method to use an array of arrays by generating gene expression profiles that yield information relevant to understanding gene function and modes of chemical action. Such information can be gained by analysis of genetic alterations resulting in loss of function, reduced levels of gene products, or over-expression of gene products. Thus, an array of arrays can be used to enhance both mode of action studies and functional genomics.
In this invention, the sequencing and genome-registering of the majority of the Lux-A Collection members were completed. Lux fusions in the Lux-A Collection were fragments of E. coli genomic DNA fused to promoterless luxCDABE gene. The genome-registered collection of lux fusions were examined for the biological responses measured by changes in bioluminescence. The present invention provides a means to measure the changes in gene expression profiles under various conditions.
In addition to being a stand-alone high throughput method, the present invention also provides the way to validate or detect false-positive result from other genome wide assays such as microarray.
The present invention provides a method to confirm the response of several promoters to a particular insult as well as to identify a number of previously unknown operons responsive to that insult.
The present invention provides a method for comparing the gene expression patterns of two samples differing in one variable. The variables may include but not limited to genotype, media, temperature, depletion or addition of nutrient, addition of an inhibitor, physical assault, biological assaults, irradiation, heat, cold, elevated or lowered pressure, desiccation, low or high ionic strength, and growth phases.
The present invention also provides the method to use an xe2x80x9carray of arraysxe2x80x9d by generating gene expression profiles that yield information relevant to understanding gene function and modes of chemical action. Such information can be gained by analysis of genetic alterations resulting in loss of function, reduced levels of gene products, or over-expression of gene products. Thus, an array of arrays can be used to enhance both modes of action studies and functional genomics.
Thus the invention provides a method for identifying altered gene expression between at least two genome-registered collections comprising:
(a) assembling at least two genome-wide scale, genome-registered collections;
(b) perturbing each collection from (a) with at least one perturbation;
(c) measuring the response of each collection to each perturbation of (b);
(d) analyzing the results of the at least one perturbation to identify genetic differences between the at least two genome-registered collections.
Additionally the invention provides a method for generating a genome-registered collection of reporter gene fusions comprising the steps of:
(a) generating a set of gene fusions comprising:
1) a reporter gene or reporter gene complex operably linked to
2) a genomic fragment from an organism of which at least 15% of the genomic nucleotide sequence is known;
(b) introducing in vitro the reporter gene fusions from step (a) into a host organism;
(c) registering the reporter gene fusions on the basis of sequence homology to the genomic sequence of the organism;
(d) repeating (a), (b), and/or (c) until reporter gene fusions have been made to at least 15% of the known genomic nucleotide sequence of said organism.
Similarly the invention provides a method for generating a genome-registered collection of reporter gene fusions comprising:
(a) generating random nucleic acid fragments from the DNA of an organism of which at least 15% of the nucleotide sequence is known;
(b) operably linking the random nucleic acid fragments generated in (a) to a vector containing a promoterless reporter gene or reporter gene complex;
(c) introducing the vector (b) containing the gene fusions into a host organism;
(d) determining the nucleic acid sequence of the distal and the proximal ends of the random nucleic fragments relative to the reporter gene or reporter gene complex;
(e) registering the sequenced fusions of step (d) on the basis of sequence homology to the genomic sequence of the host organism;
(d) repeating (a), (b), and/or (c) until reporter gene fusions have been made to at least 15% of the known genomic nucleotide sequence of said organism. Generation of the random nucleic acid fragments of step may incorporate restriction enzyme digestion, physical shearing of the genome and polymerase chain reaction.
In another embodiment the invention provides a method for generating a genome-registered collection of reporter gene fusions comprising steps of:
(a) introducing one or more transposons into the genome of an organism of which at least 15% of the nucleotide sequence is known, each transposon containing a promoterless reporter gene or reporter gene complex;
(b) determining the nucleic acid sequence of the junction between the proximal end of the genomic DNA and the transposon containing the reporter gene or reporter gene complex and registering the reporter gene fusions relative to the genomic sequence of the organism,
(c) repeating (a) and (b) until reporter gene fusions have been made to at least 15% of the known genomic nucleotide sequence of said organism.
Alternatively the invention provides a method for identifying a profile of inducing conditions for a reporter gene fusion comprising:
(a) obtaining a gene expression profile of an organism under induced and non-induced conditions wherein induced genes are identified;
(b) providing a genome-registered collection of reporter gene fusions, said fusions registered to the genome of the organism of (a);
(c) selecting the reporter gene fusions of (b) that correspond to the induced genes of (a) to create a subset of the genome-register collection;
(d) contacting the subset of the genome-register collection of (c) with the inducing conditions of (a) to identify at least one representative reporter gene fusion whose expression was altered in a similar manner as in (a);
(e) contacting the at least one representative reporter gene fusion of (d) in a high throughput manner with a multiplicity of different inducing conditions to identify a profile of inducing conditions for that reporter gene fusion.
In another embodiment the invention provides a method for generating a genome-registered collection of reporter gene fusions comprising:
(a) providing a genome from an organism wherein at least 15% of the nucleotide sequence is known;
(b) providing a series of amplification primers having homology to specific known regions of the genome of (a);
(c) amplifying portions of the genome of (a) with the primers of (b) to create a collection of nucleic acid amplification products;
(d) operably linking the amplification products of (c) to a vector containing a promoterless reporter gene or reporter gene complex;
(e) introducing the reporter gene fusions into a said organism;
(f) repeating (a)-(e) until, until reporter gene fusions have been made to at least 15% of the known genomic nucleotide sequence of said organism.
In another embodiment the invention provides a method for identifying a profile of inducing conditions for a reporter gene fusion comprising:
(a) obtaining a gene expression profile for each of mutant strain and a parental strain organism under induced and non-induced conditions wherein induced genes are identified;
(b) providing a genome-registered collection of reporter gene fusions, said fusions registered to the genome of the organism of (a);
(c) selecting the reporter gene fusions of (b) that correspond to the induced genes of (a) to create a subset of the genome-register collection;
(d) contacting the subset of the genome-register collection of (c) with the inducing conditions of (a) to identify at least one representative reporter gene fusion whose expression was altered in a similar manner as in (a);
(e) contacting the at least one representative reporter gene fusion of (d) in a high throughput manner with a multiplicity of different inducing conditions to identify a profile of inducing conditions for that reporter gene fusion.
Similarly it is an object of the invention to provide a method to validate results from comprehensive genome analysis comprising the steps of:
(a) analyzing a genome-wide, gene expression assay of an organism treated with a condition or chemical of interest to identify genes with altered expression;
(b) selecting from a genome-registered collection of reporter gene fusions those reporter gene fusions containing promoter regions operably linked to genes corresponding to the altered genes from (a) or genes co-regulated with genes corresponding to the altered genes from (a);
(c) testing expression of the reporter gene fusions selected from (b) with the conditions or chemicals of interest used in (a); and
(d) comparing the gene expression results from (c) to the gene expression result of (a).
The invention additionally provides a method to determine operon structure comprising steps of:
(a) selecting a subset of reporter gene fusions from a genome-registered collection of reporter gene fusions that map to the region of a possible operon;
(b) assaying the subset for the reporter gene function; and
(c) determining a putative operon structure based on the quantities of reporter gene function.
Alternatively the invention provides a method for constructing a cellular array containing reporter gene fusions comprising:
(a) generating a set of gene fusions comprising:
1) a reporter gene or reporter gene complex operably linked to
2) a genomic fragment from an organism of which at least 15% of the genomic nucleotide sequence is known;
(b) selecting a non-redundant subset of reporter gene fusions from the set of (a) representative of at least 15% of known or suspected promoter regions from a genome-registered collection of reporter gene fusions, each containing a known or suspected promoter region operably linked to a reporter gene or reporter gene complex; and
(c) fixing the non-redundant subset of reporter gene fusions of (b) in an array format.
In a preferred embodiment the invention provides a method for measuring gene expression responses to perturbation comprising:
(a) constructing at least 2 identical cellular arrays, each cellular array comprising a reporter gene fusion comprising:
1) a reporter gene or reporter gene complex operably linked to
2) a genomic fragment from an organism of which at least 15% of the genomic nucleotide sequence is known;
wherein at least one cellular array is a control array and at least one cellular array is an experimental array;
(b) contacting the experimental array of (a) with a perturbing condition;
(c) comparing the differences between the gene expression activity of the control and the experimental array wherein gene expression response to a perturbing condition is determined.
Organisms amenable to the present method include prokaryotes and fungi and particularly enteric bacterium.
Reporters useful in the present method include luxCDABE, lacZ, gfp, cat, galK, inaZ, luc, luxAB, bgaB, nptII, phoA, uidA and xylE.