The invention relates to essential bacterial genes and their use in identifying antibacterial agents.
Bacterial infections may be cutaneous, subcutaneous, or systemic. Opportunistic bacterial infections proliferate, especially in patients afflicted with AIDS or other diseases that compromise the immune system. Most bacteria that are pathogenic to humans are gram positive bacteria. The bacterium Streptococcus pneumoniae, for example, typically infects the respiratory tract and can cause lobar pneumonia, as well as meningitis, sinusitis, and other infections.
The invention is based on the discovery of two genes in the gram positive bacterium Streptococcus pneumoniae that are essential for the survival of this and other bacteria. For convenience, these genes, yphC and yqjK, are collectively referred to herein as xe2x80x9cessentialxe2x80x9d genes and the polypeptides that these genes encode are referred to as xe2x80x9cessentialxe2x80x9d polypeptides since Streptococcus pneumoniae cells lacking functional yphC or yqjK genes are unable to survive.
The yphC and yqjK genes are useful molecular tools for identifying similar genes in pathogenic microorganisms. The essential polypeptides that these genes encode are useful targets for identifying compounds that are inhibitors of the pathogens in which the essential polypeptides are expressed. Such compounds diminish bacterial growth by inhibiting the activity of an essential protein, or by inhibiting transcription of an essential gene or translation of the mRNA transcribed from the essential gene.
The invention, therefore, features an isolated yphC polypeptide having the amino acid sequence set forth in SEQ ID NO:2, as depicted in FIG. 1, or conservative variations thereof. An isolated nucleic acid encoding yphC also is included within the invention. In addition, the invention includes (a) an isolated nucleic acid having the sequence of SEQ ID NO:1, as depicted in FIGS. 1A-B, or degenerate variants thereof; (b) an isolated nucleic acid having the sequence of SEQ ID NO:1, or degenerate variants thereof, wherein T is replaced by U; (c) nucleic acids complementary to (a) and (b); and (d) fragments of (a), (b), and (c) that are at least 15 base pairs in length and that hybridize under stringent conditions, as described below, to genomic DNA encoding the polypeptide of SEQ ID NO:2. The yphC polypeptide depicted in FIGS. 1A-B is a partial sequence of the full-length polypeptide, which is depicted in FIGS. 2A-2B. The invention also features an isolated yphC polypeptide having the amino acid sequence set forth in SEQ ID NO:5, as depicted in FIGS. 2A-2B, or conservative variations thereof. An isolated nucleic acid encoding full-length yphC also is included within the invention. In addition, the invention includes (a) an isolated nucleic acid having the sequence of SEQ ID NO:4, as depicted in FIGS. 2A-2B, or degenerate variants thereof; (b) an isolated nucleic acid having the sequence of SEQ ID NO:4, or degenerate variants thereof, wherein T is replace by U; and (c) nucleic acids complementary to (a) and (b).
As described above for yphC, the invention includes an isolated nucleic acid encoding yqjK. In addition, the invention includes (a) an isolated nucleic acid having the sequence of SEQ ID NO:7, as depicted in FIGS. 3A-B, or degenerate variants thereof; (b) an isolated nucleic acid having the sequence of SEQ ID NO:7, or degenerate variants thereof, wherein T is replaced by U; (c) nucleic acids complementary to (a) and (b); and (d) fragments of (a), (b), and (c) that are at least 15 base pairs in length and that hybridize under stringent conditions, as described below, to genomic DNA encoding the polypeptide of SEQ ID NO:8. These sequences are summarized in Table 1.
Identification of these essential genes allows homologs of the essential genes to be found in other strains within the species, and it allows orthologs of the essential genes to be found in other organisms (e.g., Bacillus sp., H. influenzae, H. pylori, and E. coli). While xe2x80x9chomologsxe2x80x9d are structurally similar genes contained within the Streptococcus species, xe2x80x9corthologsxe2x80x9d are functionally equivalent genes from other species, as determined, for example, in a standard complementation assay. Thus, the essential polypeptides can be used not only as a model for identifying similar genes in other Streptococcus strains, but also to identify homologs and orthologs of essential genes in other species (e.g., other gram positive bacteria, particularly those bacteria that are pathogenic to humans, and other bacteria generally). Such orthologs can be identified, for example, in a conventional complementation assay. In addition, or alternatively, such orthologs can be expected to exist in bacteria in the same branch of the phylogenetic tree, as set forth, for example, at ftp://ftp.cme.msu.edu/ pub/RDP/SSU_rRNA/SSU/Prok.phylo. For example, B. subtilis is in the B. subtilis subgroup of the B. subtilis group in the Bacillus-Lactobaccillus-Streptococcus Subdivision of the Gram positive phylum. Likewise, S. pneumoniae belong to the Stc. pneumonia subgroup of Streptococci, which also are in the Bacillus-Lactobacillus-Streptococcus subdivision of the Gram positive phylum. E. coli belong to the Escherichia Salmonella group of the Enterics and relatives within the Gamma subdivision of the Purple bacteria. Other bacteria within the same phylum (particularly, bacteria within the same subdivision, group, or subgroup) can be expected to contain an ortholog of the yphC and/or yqjK genes described herein.
Examples of orthologs of the Streptococcus yphC and yqjK genes are summarized in Table 2. As shown in Table 2, the Streptococcus gene yphC has an ortholog in B. subtilis, termed xe2x80x9cB-yphC,xe2x80x9d and an ortholog in E. coli, termed xe2x80x9cyfgK,xe2x80x9d which is also known as xe2x80x9cf503.xe2x80x9d The Streptococcus gene yqjK also has an ortholog in B. subtilis, termed xe2x80x9cB-yqjK,xe2x80x9d and an ortholog in E. coli, termed xe2x80x9celaC,xe2x80x9d which is also known as xe2x80x9co311.xe2x80x9d As discussed below, orthologs of essential genes may themselves be essential or non-essential in the organism in which they are found.
As determined by the experiments described below, the B-yphC, yfgK, and B-yqjK orthologs are essential for survival of the bacteria in which they are found. Thus, these essential orthologous genes and the polypeptides encoded by these orthologs can be used to identify compounds that inhibit the growth of the host organism (e.g., compounds that inhibit the activity of an essential protein, or inhibit transcription of an essential gene).
The yphC polypeptides and genes described herein include the polypeptides and genes set forth in FIGS. 1A-B and 2A-2B herein, as well as isozymes, variants, and conservative variations of the sequences set forth in FIGS. 1A-B and 2A-2B. The invention includes various isozymes of yphC and yqjK. For example, the invention includes a gene that encodes an essential polypeptide but which gene includes one or more point mutations, deletions, or promoter variants, provided that the resulting essential polypeptide retains a biological function of an essential polypeptide.
The yphC polypeptide has structural characteristics of known GTPases. Using BLAST analysis, the yphC polypeptide has been shown to contain two domains that are predicted to be GTPase domains, and yphC displays GTPase activity in vitro. This GTPase activity is linked to the essentiality of the yphC polypeptide. When point mutations are made in each GTPase domain of yphC such that the mutants are unable to bind GTP, such mutants no longer are able to complement a bacterial strain that lacks yphC. The yqjK polypeptide has structural characteristics of known sulfatases. Thus, the various isozymes, variants, and conservative variations of the yphC and yqjK sequences set forth in FIGS. 1A-B and 2A-2B retain a biological function of yphC or yqjK as determined, for example, in an assay of GTPase or sulfatase activity, or in a conventional complementation assay. Suitable GTPase and sulfatase activity assays are well known in the art (see, e.g., Bollag, et al., Meth. Enzymol. 255:161 (1995) and Barbeyron, et al., Microbiol. 141:2897 (1995), incorporated herein by reference). The GTPase activity of yphC can also be assayed using a conventional Malachite Green phosphorelease assay (see, e.g., Lanzetta et al., 1979, Analytical Biochemistry 100:95-97). The inclusion of KCl in such an assay leads to an approximately 70-fold stimulation of GTPase activity, and thus provides a sensitive assay for detection of GTP activity.
Also encompassed by the term yphC gene are degenerate variants of the nucleic acid sequences set forth in FIGS. 1A-B and 2A-2B (SEQ ID NO:1 and 4). Degenerate variants of a nucleic acid sequence exist because of the degeneracy of the amino acid code; thus, those sequences that vary from the sequence represented by SEQ ID NO:1 and 4, but which nonetheless encode a yphC polypeptide are included within the invention.
Likewise, because of the similarity in the structures of amino acids, conservative variations (as described herein) can be made in the amino acid sequence of the yphC polypeptide while retaining the function of the polypeptide (e.g., as determined in a conventional complementation assay). Other yphC polypeptides and genes identified in additional bacterial strains may be such conservative variations or degenerate variants of the particular yphC polypeptide and nucleic acid set forth in FIGS. 1A-B and 2A-2B (SEQ ID NOs:1-6). The yphC polypeptide and gene share at least 80%, e.g., 90%, sequence identity with SEQ ID NOs:2 and 1, respectively, or SEQ ID NOs: 5 and 4, respectively. Regardless of the percent sequence identity between the yphC sequence and the sequences represented by SEQ ID NOs:1, 2, 4, and 5, the yphC genes and polypeptides encompassed by the invention preferably are able to complement for the lack of yphC function (e.g., in a temperature-sensitive mutant) in a standard complementation assay.
Additional yphC genes that are identified and cloned from additional bacterial strains, and pathogenic, gram-positive strains in particular, can be used to produce yphC polypeptides for use in the various methods described herein, e.g., for identifying antibacterial agents. Likewise, the term yqjK encompasses isozymes, variants, and conservative variations of the sequences depicted in FIGS. 3A-B.
In various embodiments, the essential polypeptide used in the assays described herein is derived from a non-pathogenic or pathogenic gram positive bacterium. For example, the polypeptide can be derived from a Streptococcus strain, such as Streptococcus pneumoniae, Streptococcus pyogenes, Streptococcus agalactiae, Streptococcus endocarditis, Streptococcus faecium, Streptococcus sangus, Streptococcus viridans, and Streptococcus hemolyticus. Orthologs of the yphC and yqjK genes can be derived from a wide spectrum of bacteria, such as E. coli and Bacillus subtilis. 
Having identified the yphC and yqjK genes described herein as being essential for survival, these essential genes and the polypeptides encoded by these essential genes and their essential homologs and orthologs can be used to identify antibacterial agents. Such antibacterial agents can readily be identified with high throughput assays to detect inhibition of the metabolic pathway in which the essential polypeptide participates. This inhibition can be caused by small molecules interacting with (e.g., binding directly or indirectly to) the essential polypeptide or other essential polypeptides in that pathway.
An exemplary method for identifying antibacterial compounds involves screening for small molecules that specifically interact with (i.e., bind directly or indirectly to) the essential polypeptide. A variety of suitable interaction and binding assays are known in the art as described, for example, in U.S. Pat. Nos. 5,585,277 and 5,679,582, incorporated herein by reference. For example, in various conventional assays, test compounds can be assayed for their ability to interact with an essential polypeptide by measuring the ability of the small molecule to stabilize the essential polypeptide in its folded, rather than unfolded, state. More specifically, the degree of protection from unfolding that is afforded by the test compound can be measured. Test compounds that bind the essential polypeptide with high affinity cause, for example, a large shift in the temperature at which the polypeptide is denatured. Test compounds that stabilize the essential polypeptide in a folded state can be further tested for antibacterial activity in a standard susceptibility assay.
Another exemplary method for identifying antibacterial agents involves measuring the ability of a test compound to bind to one of the essential polypeptides described herein. Binding can be assayed in a conventional capillary electrophoresis assay in which binding of the test compound to the essential polypeptide changes the electrophoretic mobility of the essential polypeptide.
Another suitable method for identifying inhibitors of the essential polypeptides involves identifying a biochemical activity of the essential polypeptide and then screening for small molecule inhibitors of the activity using, for example, a high throughput screening method. The yphC polypeptide has structural characteristics of known GTPases and displays GTPase activity in vitro. Therefore, inhibitors of this polypeptide therefore can be identified by their ability to inhibit the GTPase activity of yphC in a conventional assay of GTPase activity. Suitable assays have been described (e.g., Gollag et al., Meth. Enzymol. 255: 161-170, 1995, which is incorporated herein by reference). A detailed example of a suitable assay is set forth below.
The yqjK polypeptide has structural characteristics of sulfatases and is expected to function as a sulfatase. Accordingly, inhibitors of the yqjK polypeptide can be identified by assaying for the ability of the test compound to inhibit the sulfatase activity of yqjK. An example of a suitable assay is described by Barbeyron et al., Microbiol. 141:2897-2904, 1995, which is incorporated herein by reference.
The invention also includes a method for identifying an antibacterial agent which method entails: (a) contacting an essential polypeptide, or homolog or orthologs thereof, with a test compound; (b) detecting binding of the test compound to the polypeptide or homolog or ortholog; and, optionally, (c) determining whether a test compound that binds to the polypeptide or homolog or ortholog inhibits growth of bacteria, relative to growth of bacteria cultured in the absence of the test compound that binds to the polypeptide or homolog or ortholog, as an indication that the test compound is an antibacterial agent.
In another suitable assay, a promoter that responds to depletion of the essential polypeptide by upregulation or downregulation is linked to a reporter gene. To identify a promoter that is up- or down-regulated by the depletion of an essential protein, the gene encoding the essential protein is deleted from the genome and replaced with a version of the gene in which the sequence encoding the essential protein is operably linked to a regulatable promoter. The cells containing this regulatable genetic construct are kept alive by the essential polypeptide produced from the genetic construct containing the regulatable promoter. However, the regulatable promoter allows the expression of the essential polypeptide to be reduced to a level that causes growth inhibition. Total RNA prepared from bacteria under such growth-limiting conditions is compared with RNA from wild-type cells. Standard methods of transcriptional profiling can be used to identify mRNA species that are either more or less abundant (i.e., up- or down-regulated) when expressed under the limiting conditions. Genomic sequence information, e.g., from GenBank, can be used to identify the promoter that drives expression of the identified RNA species. Such promoters are up- or down-regulated by depletion of the essential polypeptide.
Having identified a promoter(s) that is up- or down-regulated by depletion of the essential polypeptide, the promoter(s) is operably linked to a reporter gene (e.g., xcex2-galactosidase, gus, or green fluorescent protein (GFP)). A bacterial strain containing this reporter gene construct is then exposed to test compounds. Compounds that inhibit the essential polypeptide (or other polypeptides in the essential pathway in which the essential polypeptide participates) cause a functional depletion of the essential polypeptide and therefore lead to an upregulation or downregulation of expression the reporter gene. Compounds that inhibit the essential polypeptides in such an assay are expected to be antibacterial and can be further tested, if desired, in standard susceptibility assays.
In a related method for identifying antibacterial compounds, the essential polypeptides are used to isolate peptide or nucleic acid ligands that specifically bind the essential polypeptides. These peptide or nucleic acid ligands are then used in a displacement screen to identify small molecules that interact with the essential polypeptide. Such assays can be carried out essentially as described above.
In still another method, interaction of a test compound with an essential polypeptide (i.e., direct or indirect binding) can be detected in a conventional two-hybrid system for detecting protein/protein interactions (e.g., in yeast or mammalian cells). A test compound found to interact with the essential polypeptide can be further tested for antibacterial activity in a conventional susceptibility assay. Generally, in such two-hybrid methods, (a) the essential polypeptide is provided as a fusion protein that includes the polypeptide fused to (i) a transcription activation domain of a transcription factor or (ii) a DNA-binding domain of a transcription factor; (b) the test polypeptide is provided as a fusion protein that includes the test polypeptide fused to (i) a transcription activation domain of a transcription factor or (ii) a DNA-binding domain of a transcription factor; and (c) binding of the test polypeptide to the polypeptide is detected as a reconstitution of a transcription factor. Homologs and orthologs of the essential polypeptides can be used in similar methods. Reconstitution of the transcription factor can be detected, for example, by detecting transcription of a gene that is operably linked to a DNA sequence bound by the DNA-binding domain of the. reconstituted transcription factor (See, for example, White, 1996, Proc. Natl. Acad. Sci. 93:10001-10003 and references cited therein and Vidal et al., 1996, Proc. Natl. Acad. Sci. 93:10315-10320).
In an alternative method, an isolated nucleic acid molecule encoding an essential polypeptide is used to identify a compound that decreases the expression of an essential polypeptide in vivo. Such compounds can be used as antibacterial agents. To identify such compounds, cells that express an essential polypeptide are cultured, exposed to a test compound (or a mixture of test compounds), and the level of expression or activity is compared with the level of essential polypeptide expression or activity in cells that are otherwise identical but that have not been exposed to the test compound(s). Many standard quantitative assays of gene expression can be utilized in this aspect of the invention.
To identify compounds that modulate expression of an essential polypeptide (or homologous or orthologous sequence), the test compound(s) can be added at varying concentrations to the culture medium of cells that express an essential polypeptide (or homolog or ortholog), as described herein. Such test compounds can include small molecules (typically, non-protein, non-polysaccharide chemical entities), polypeptides, and nucleic acids. The expression of the essential polypeptide is then measured, for example, by Northern blot PCR analysis or RNAse protection analyses using a nucleic acid molecule of the invention as a probe. The level of expression in the presence of the test molecule, compared with the level of expression in its absence, will indicate whether or not the test molecule alters the expression of the essential polypeptide. Because the yphC and yqjK polypeptides are essential for survival, test compounds that inhibit the expression and/or function of the essential polypeptide, or of an essential homolog or ortholog thereof, will inhibit growth of, or kill, the cells that express such polypeptides.
The polypeptides encoded by essential genes also can be used, separately or together, in assays to identify test compounds that interact with these polypeptides. Test compounds that interact with these polypeptides then can readily be tested, in conventional assays, for their ability to inhibit bacterial growth. Test compounds that interact with the essential polypeptides are candidate antibacterial agents, in contrast to compounds that do not interact with the essential polypeptides. As described herein, any of a variety of art-known methods can be used to assay for the interaction of test compounds with the essential polypeptides.
Typically, the test compound will be a small organic molecule. Alternatively, the test compound can be a test polypeptide (e.g., a polypeptide having a random or predetermined amino acid sequence; or a naturally-occurring or synthetic polypeptide) or a nucleic acid, such as a DNA or RNA molecule. The test compound can be a naturally-occurring compound or it can be synthetically produced, if desired. Synthetic libraries, chemical libraries, and the like can be screened to identify compounds that bind the essential polypeptide. More generally, binding of test a compound to the polypeptide, homolog, or ortholog can be detected either in vitro or in vivo. If desired, the above-described methods for identifying compounds that modulate the expression of the polypeptides of the invention can be combined with measuring the levels of the essential polypeptides expressed in the cells, e.g., by performing a Western blot analysis using antibodies that bind an essential polypeptide.
Regardless of the source of the test compound, the essential polypeptides described herein can be used to identify compounds that inhibit the activity of an essential protein or transcription of an essential gene or translation of the mRNA transcribed from the essential gene. These antibacterial agents can be used to inhibit a wide spectrum of pathogenic or non-pathogenic bacterial strains.
In other embodiments, the invention includes pharmaceutical formulations that include a pharmaceutically acceptable excipient and an antibacterial agent identified using the methods described herein. In particular, the invention includes pharmaceutical formulations that contain antibacterial agents that inhibit the growth of, or kill, pathogenic bacterial strains (e.g., pathogenic gram positive bacterial strains such as pathogenic Streptococcus strains). Such pharmaceutical formulations can be used in a method of treating a bacterial infection in an organism (e.g., a Streptococcus infection). Such a method entails administering to the organism a therapeutically effective amount of the pharmaceutical formulation, i.e., an amount sufficient to ameliorate signs and/or symptoms of the bacterial infection. In particular, such pharmaceutical formulations can be used to treat bacterial infections in mammals such as humans and domesticated mammals (e.g., cows, pigs, dogs, and cats), and in plants. The efficacy of such antibacterial agents in humans can be estimated in an animal model system well known to those of skill in the art (e.g., mouse and rabbit model systems of, for example, streptococcal pneumonia).
Various affinity reagents that are permeable to the microbial membrane (i.e., antibodies and antibody fragments) are useful in practicing the methods of the invention. For example polyclonal and monoclonal antibodies that specifically bind to the yphC polypeptide or yqjK polypeptide can facilitate detection of essential polypeptides in various bacterial strains (or extracts thereof). These antibodies also are useful for detecting binding of a test compound to essential polypeptides (e.g., using the assays described herein). In addition, monoclonal antibodies that bind essential polypeptides can themselves be used as antibacterial agents.
The invention further features methods of identifying from a large group of mutants those strains that have conditional lethal mutations. In general, the gene and corresponding gene product are subsequently identified, although the strains themselves can be used in screening or diagnostic assays. The mechanism(s) of action for the identified genes and gene products provide a rational basis for the design of antibacterial therapeutic agents. These antibacterial agents reduce the action of the gene product in a wild type strain, and therefore are useful in treating a subject with that type, or a similarly susceptible type, of infection by administering the agent to the subject in a pharmaceutically effective amount. Reduction in the action of the gene product includes competitive inhibition of the gene product for the active site of an enzyme or receptor; non-competitive inhibition; disrupting an intracellular cascade path which requires the gene product; binding to the gene product itself, before or after post-translational processing; and acting as a gene product mimetic, thereby down-regulating the activity. Therapeutic agents include monoclonal antibodies raised against the gene product.
Furthermore, the presence of the gene sequence in certain cells (e.g., a pathogenic bacterium of the same genus or similar species), and the absence or divergence of the sequence in host cells can be determined, if desired. Therapeutic agents directed toward genes or gene products that are not present in the host have several advantages, including fewer side effects, and a lower overall required dosage.
Nucleic acids include both RNA and DNA, including genomic DNA and synthetic (e.g., chemically synthesized) DNA. Nucleic acids can be double-stranded or single-stranded. Where single-stranded, the nucleic acid may be a sense strand or an antisense strand. Nucleic acids can be synthesized using oligonucleotide analogs or derivatives (e.g., inosine or phosphorothioate nucleotides). Such oligonucleotides can be used, for example, to prepare nucleic acids that have altered base-pairing abilities or increased resistance to nucleases.
An isolated nucleic acid is a DNA or RNA that is not immediately contiguous with both of the coding sequences with which it is immediately contiguous (one on the 5xe2x80x2 end and one on the 3xe2x80x2 end) in the naturally occurring genome of the organism from which it is derived. Thus, in one embodiment, an isolated nucleic acid includes some or all of the 5xe2x80x2 non-coding (e.g., promoter) sequences that are immediately contiguous to the coding sequence. The term therefore includes, for example, a recombinant DNA that is incorporated into a vector, into an autonomously replicating plasmid or virus, or into the genomic DNA of a prokaryote or eukaryote, or which exists as a separate molecule (e.g., a genomic DNA fragment produced by PCR or restriction endonuclease treatment) independent of other sequences. It also includes a recombinant DNA that is part of a hybrid gene encoding an additional polypeptide sequence. The term xe2x80x9cisolatedxe2x80x9d can refer to a nucleic acid or polypeptide that is substantially free of cellular material, viral material, or culture medium (when produced by recombinant DNA techniques), or chemical precursors or other chemicals (when chemically synthesized). Moreover, an isolated nucleic acid fragment is a nucleic acid fragment that is not naturally occurring as a fragment and would not be found in the natural state.
A nucleic acid sequence that is substantially identical to an essential nucleotide sequence is at least 80% (e.g., at least 85%) identical to the nucleotide sequence of yphC or yqjK as represented by the SEQ ID NOs listed in Table 1, as depicted in FIGS. 1A-3B. For purposes of comparison of nucleic acids, the length of the reference nucleic acid sequence will generally be at least 40 nucleotides, e.g., at least 60 nucleotides or more nucleotides.
To determine the percent identity of two amino acid sequences or of two nucleic acids, the sequences are aligned for optimal comparison purposes (e.g., gaps can be introduced in the sequence of a first amino acid or nucleic acid sequence for optimal alignment with a second amino or nucleic acid sequence). The amino acid residues or nucleotides at corresponding amino acid positions or nucleotide positions are then compared. When a position in the first sequence is occupied by the same amino acid residue or nucleotide as the corresponding position in the second sequence, the molecules are identical at that position. The percent identity between the two sequences is a function of the number of identical positions shared by the sequences (i.e., % identity=# of identical positions/total # of overlapping positionsxc3x97100). Preferably, the two sequences are the same length.
The determination of percent identity or homology between two sequences can be accomplished using a mathematical algorithm. A suitable, mathematical algorithm utilized for the comparison of two sequences is the algorithm of Karlin and Altschul (1990) Proc. Nat""l Acad. Sci. USA 87:2264-2268, modified as in Karlin and Altschul (1993) Proc. Nat""l Acad. Sci. USA 90:5873-5877. Such an algorithm is incorporated into the NBLAST and XBLAST programs of Altschul, et al. (1990) J. Mol. Biol. 215:403-410. BLAST nucleotide searches can be performed with the NBLAST program, score=100, wordlength=12 to obtain nucleotide sequences homologous to yphC or yqjK nucleic acid molecules of the invention. BLAST protein searches can be performed with the XBLAST program, score=50, wordlength=3 to obtain amino acid sequences homologous to yphC or yqjK protein molecules of the invention. To obtain gapped alignments for comparison purposes, Gapped BLAST can be utilized as described in Altschul et al., (1997) Nucleic Acids Res. 25:3389-3402. When utilizing BLAST and Gapped BLAST programs, the default parameters of the respective programs (e.g., XBLAST and NBLAST) can be used. See http://www.ncbi.nlm.nih.gov. Another example of a mathematical algorithm utilized for the comparison of sequences is the algorithm of Myers and Miller, CABIOS (1989). Such an algorithm is incorporated into the ALIGN program (version 2.0) which is part of the GCG sequence alignment software package. When utilizing the ALIGN program for comparing amino acid sequences, a PAM120 weight residue table, a gap length penalty of 12, and a gap penalty of 4 can be used.
The percent identity between two sequences can be determined using the techniques described above, with or without allowing gaps. In calculating percent identity, only exact matches are counted.
The essential polypeptides useful in practicing the invention include, but are not limited to, recombinant polypeptides and natural polypeptides. Also useful in the invention are nucleic acid sequences that encode forms of essential polypeptides in which naturally occurring amino acid sequences are altered or deleted. Preferred nucleic acids encode polypeptides that are soluble under normal physiological conditions. Also within the invention are nucleic acids encoding fusion proteins in which a portion of an essential polypeptide is fused to an unrelated polypeptide (e.g., a marker polypeptide or a fusion partner) to create a fusion protein. For example, the polypeptide can be fused to a hexa-histidine tag to facilitate purification of bacterially expressed polypeptides, or to a hemagglutinin tag to facilitate purification of polypeptides expressed in eukaryotic cells. The invention also includes, for example, isolated polypeptides (and the nucleic acids that encode these polypeptides) that include a first portion and a second portion; the first portion includes an essential polypeptide, and the second portion includes an immunoglobulin constant (Fc) region or a detectable marker.
The fusion partner can be, for example, a polypeptide which facilitates secretion, e.g., a secretory sequence. Such a fused polypeptide is typically referred to as a preprotein. The secretory sequence can be cleaved by the host cell to form the mature protein. Also within the invention are nucleic acids that encode an essential polypeptide fused to a polypeptide sequence to produce an inactive preprotein. Preproteins can be converted into the active form of the protein by removal of the inactivating sequence.
The invention also includes nucleic acids that hybridize, e.g., under stringent hybridization conditions (as defined herein) to all or a portion of the nucleotide sequences represented by SEQ ID NO:1 or 7, or their complements. The hybridizing portion of the hybridizing nucleic acids is typically at least 15 (e.g., 20, 25, 30, or 50) nucleotides in length. The hybridizing portion of the hybridizing nucleic acid is at least 80%, e.g., at least 95%, or at least 98%, identical to the sequence of a portion or all of a nucleic acid encoding an essential polypeptide or its complement. Hybridizing nucleic acids of the type described herein can be used, for example, as a cloning probe, a primer (e.g., a PCR primer), or a diagnostic probe. Nucleic acids that hybridize to the nucleotide sequences represented by SEQ ID NOs: 1 and 7 are considered xe2x80x9cantisense oligonucleotides.xe2x80x9d
Also part of in the invention are various engineered cells, e.g., transformed host cells, that contain an essential nucleic acid described herein. A transformed cell is a cell into which (or into an ancestor of which) has been introduced, by means of recombinant DNA techniques, a nucleic acid encoding an essential polypeptide. Both prokaryotic and eukaryotic cells are included, e.g., bacteria, such as Streptococcus, Bacillus, and the like.
Also within the invention are genetic constructs (e.g., vectors and plasmids) that include a nucleic acid of the invention that is operably linked to a transcription and/or translation sequence to enable expression, e.g., expression vectors. A selected nucleic acid, e.g., a DNA molecule encoding an essential polypeptide, is xe2x80x9coperably linkedxe2x80x9d to a transcription and/or translation sequence when it is positioned adjacent to one or more sequence elements, e.g., a promoter, which direct transcription and/or translation of the sequence such that the sequence elements can control transcription and/or translation of the selected nucleic acid.
The invention also features purified or isolated polypeptides encoded by the essential genes yphC and yqjK. The terms xe2x80x9cproteinxe2x80x9d and xe2x80x9cpolypeptidexe2x80x9d both refer to any chain of amino acids, regardless of length or post-translational modification (e.g., glycosylation or phosphorylation). Thus, the terms yphC polypeptide and yqjK polypeptide include full-length, naturally occurring, isolated yphC and yqjK proteins, respectively, as well as recombinantly or synthetically produced polypeptides that correspond to the full-length, naturally occurring proteins, or to a portion of the naturally occurring or synthetic polypeptide (provided that a portion of the yphC polypeptide includes a portion of the sequence depicted in FIGS. 1A-B).
A purified or isolated compound is a composition that is at least 60% by weight the compound of interest, e.g., an essential polypeptide or antibody. Preferably the preparation is at least 75% (e.g., at least 90%, 95%, or even 99%) by weight the compound of interest. Purity can be measured by any appropriate standard method, e.g., column chromatography, polyacrylamide gel electrophoresis, or HPLC analysis.
Preferred essential polypeptides include a sequence substantially identical to all or a portion of a naturally occurring essential polypeptide, e.g., including all or a portion of the sequences shown in FIGS. 1A-B, 2A-2B, and 3A-B (provided that a portion of the yphC polypeptide includes a portion of the sequence depicted in FIGS. 1A-B). Polypeptides xe2x80x9csubstantially identicalxe2x80x9d to the essential polypeptide sequences described herein have an amino acid sequence that is at least 80% identical to the amino acid sequence of the essential polypeptides represented by the SEQ ID NOs listed in Table 1 (measured as described herein). The new polypeptides can also have a greater percentage identity, e.g., 85%, 90%, 95%, or even higher. For purposes of comparison, the length of the reference essential polypeptide sequence will generally be at least 16 amino acids, e.g., at least 20 or 25 amino acids.
In the case of polypeptide sequences that are less than 100% identical to a reference sequence, the non-identical positions are preferably, but not necessarily, conservative substitutions for the reference sequence. Conservative substitutions typically include substitutions within the following groups: glycine and alanine; valine, isoleucine, and leucine; aspartic acid and glutamic acid; asparagine and glutamine; serine and threonine; lysine and arginine; and phenylalanine and tyrosine.
Where a particular polypeptide is said to have a specific percent identity to a reference polypeptide of a defined length, the percent identity is relative to the reference polypeptide. Thus, a polypeptide that is 50% identical to a reference polypeptide that is 100 amino acids long can be a 50 amino acid polypeptide that is completely identical to a 50 amino acid long portion of the reference polypeptide. Alternatively, it can be a 100 amino acid long polypeptide that is 50% identical to the reference polypeptide over its entire length. Of course, other polypeptides also will meet the same criteria.
The invention also features purified or isolated antibodies that specifically bind to an essential polypeptide. An antibody xe2x80x9cspecifically bindsxe2x80x9d to a particular antigen, e.g., an essential polypeptide, when it binds to that antigen, but does not substantially recognize and bind to other molecules in a sample, e.g., a biological sample, that naturally includes an essential polypeptide.
In another aspect, the invention features a method for detecting an essential polypeptide in a sample. This method includes: obtaining a sample suspected of containing an essential polypeptide; contacting the sample with an antibody that specifically binds to an essential polypeptide under conditions that allow the formation of complexes of an antibody and the essential polypeptide; and detecting the complexes, if any, as an indication of the presence of an essential polypeptide in the sample.
Also encompassed by the invention is a method of obtaining a gene related to an essential gene. Such a method entails obtaining a labeled probe that includes an isolated nucleic acid which encodes all or a portion of an essential nucleic acid, or a homolog thereof; screening a nucleic acid fragment library with the labeled probe under conditions that allow hybridization of the probe to nucleic acid fragments in the library, thereby forming nucleic acid duplexes; isolating labeled duplexes, if any; and preparing a full-length gene sequence from the nucleic acid fragments in any labeled duplex to obtain a gene related to the essential gene. Alternatively, such related genes can be identified by carrying out a BLAST search of various sequenced bacterial genomes, as described above.
The invention offers several advantages. For example, the methods for identifying antibacterial agents can be configured for high throughput screening of numerous candidate antibacterial agents. Because the essential genes disclosed herein are thought to be highly conserved, antibacterial drugs targeted to these genes or their gene products are expected to have antibacterial activity against a wide range of bacteria.
Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, suitable methods and materials are described herein. All publications, patent applications, patents, and other references mentioned herein are incorporated herein by reference in their entirety. In the case of a conflict, the present specification, including definitions, will control. In addition, the materials, methods, and examples are illustrative and are not intended to limit the scope of the invention, which is defined by the claims.
Other features and advantages of the invention will be apparent from the following detailed description, and from the claims.